ArgusEncode32() accepts little endian?

Carter Bullard carter at qosient.com
Thu Jul 18 11:55:04 EDT 2013


Hey  Matt,
Comments inline.  Not trying to beat you up too bad.
Carter



On Jul 18, 2013, at 11:38 AM, Matt Brown <matthewbrown at gmail.com> wrote:

> Carter,
> 
>> So what else is nDPI looking at that isn't content?
> As with libprotoident... packet contents, src/dst ports, flow tracking, packet lengths, more...
> 
>> Is it not Deep Packet Inspection?
> I think the missing piece of raservices() versus nDPI is nDPI's use of flow tracking and robust understanding of "applications" ("What is robust understanding?" see bittorrent below).  Clearly the algo that seeks patterns (facilitated by rauserdata()) is a very viable method of application detection.  Simply versus nDPI (163 definitions) or libprotoident (223 definitions), the std.sig has a smaller count of sigs (84).
> 

Argus does do flow tracking.  At least I think it does. Provides direction,
packet size information, and content.  Not sure that it needs anything else?

I don't think devo'ing argus so that it can generate packet data for other
applications is a good idea.  223 protocols is really chicken feed, compared
to other tools, and rauserdata() can characterize any protocol, you just
have to run it against your argus data, of your applications/protocols.  I
have had signature files with 3K+ applications.  Easy to do with the tools.

> 
> 
> I began looking at the classes alphabetically, and after reading bittorrent.c and dns.c, I decided on my conclusion of (near) futility in the strategy of converting nDPI classes to an raservices().conf_ as I was simply trying to generate byte patterns from given byte patterns_.
> 
> 
> nDPI utilizes the following features for protocol/application detection:
> - packet contents
> - considers flow/session tracking (including directional tracking)
> - considers size of packets
> - considers packet order in flow
> 
> For instance, reviewing dns.c (https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/dns.c).
> 
> ndpi_search_dns():
> 
> 1) Is packet UDP or TCP?
> 
> 2) Source port or dest port is 53, and the packet payload length is that of a DNS header.
> 3) Pull out the following protocol stuff:
> - flags
> - transaction ID
> - queries
> - answer resource records
> - authority resource records
> - any more resource records
> - is it a query (checks based on if the flags are set to be a query)
> - return code
> 
> 4) Check a lot of the protocol stuff (including additional parsing of the packet byte structure) to decide whether we're dealing with a question ("request") or a answer ("response"); then it is DNS.

So you don't need a response to know that its DNS.  You shouldn't need to parse
the DNS transaction to know that its DNS, especially since libprotoident claims
it can do tracking with only 4 bytes of payload.

So you're describing something that maybe interesting, but its not high performance
protocol characterization and identification.

> 
> 5) This class is also used to help identify "subprotocols" (like "Google", "Facebook", etc).
> 
> 
> The decision for Bittorrent is even more complex (https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/bittorrent.c):
> - tracks the flow of events (conditional checks "second packet" of flow)
> - considers packet length (is this possible with raservices()?)
> - considers ASCII content and performs string matches
> 
> 
> I suppose that all of these things can match patterns, but not directly derived from the nDPI classes themselves, which was what I was hoping I would be able to do (literally static direct data pattern generation).
> 
> If I search for the string "hton" or "ntoh", I get some hits in a variety of classes, but very few classes are as clear cut to build the raservices().conf lines as the original example afp.c.

So, as I mentioned earlier, hton() and ntoh() should not be used for this
type of analysis, as those are only used for 16, 32, 64 and 128 bit constructs.
How do you know that the next 32 bits are an int?  If you knew that, then you
already know what the protocol is.
> 
> 
> Since this is the case, I do believe it would be better to do one of two things:
> 
> 1) build an ra client that fires the packet payloads into the nDPI engine.  Would this provide for flow tracking, etc?  Why would I do this over just using ntop with nDPI along side an argus probe since I would basically have to... run ntop with nDPI along side an argus probe? :D
> 2) run nDPI along side an argus, and perform the necessary grunt work of application-to-byte-pattern correlatation to pull out the byte patterns for use within an raservices().conf. This will increase the protocol sigs that can be distributed with the argus-client package.
> 
> 
> What type of things can raservices() handle in the area?
> - ASCII pattern to byte pattern conversion = okay
> - payload length = not length testing exactly, but as long as we can match a byte pattern, it'd be okay
> - flow tracking? (I can't really define what this is, but I would consider some of the bittorrent detection to be this)
> 
> 
> So I'm just not sure if raservices() is robust enough to handle a "straight" type conversion (what I was attempting); but we can use the application detection tools (nDPI and libprotoident for instance) to detect the applications, then determine the common byte patterns for the applications (with rauserdata()) for use with raservices().  That's a whole lotta work!
> 
> What do you think of this last strategy?

Have you used rauserdata()?

> 
> 
> Thanks again,
> 
> Matt
> 
> 
> 
> On Jul 17, 2013, at 4:07 PM, Carter Bullard <carter at qosient.com> wrote:
> 
>> So what else is nDPI looking at that isn't content?
>> Is it not Deep Packet Inspection?
>> 
>> Carter
>> 
>> 
>> 
>> On Jul 17, 2013, at 3:12 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>> 
>>> Thanks carter.  I was about to write that the deeper I got my head into the nDPI classes the clearer it became that a large majority of the "protocols/application" identifiers considered many more aspects of the conversations than simply a byte pattern that would be at all useful with raservices().
>>> 
>>> The best thing for me to do at this point would be to run nDPI and libprotoident along side the argus probe, shuffle through the flows to determine exactly what "protocol/application" were tagged by nDPI and libprotoident, then run rauserdata() targeting these flows.
>>> 
>>> Anyone want to assist? :)
>>> 
>>> Effort seems redundant at that point, but well worth it if you don't want to maintain those flow engines.
>>> 
>>> Ironically, right now, raservices() is currently segfaulting on me, but that's for another thread.
>>> 
>>> 
>>> Thanks for your efforts and thorough explanations,
>>> 
>>> Matt
>>> 
>>> 
>>> On Jul 17, 2013, at 2:59 PM, Carter Bullard <carter at qosient.com> wrote:
>>> 
>>>> Hey Matt,
>>>> So, I have a lot of AFP, assuming that AFP is apple file protocol, over tcp
>>>> or over udp,…, port 548.  Well, not a huge amount but some, and I have
>>>> a lot of argus records that have these captured AFP sessions.  So lets check
>>>> your raservices() signature.
>>>> 
>>>> Here are the commands I used against the argus repository I have on my
>>>> primary apple client at QoSient World Headquarters.  Grab all the afp over tcp
>>>> status records.  Aggregate the records, using racluster() to get a single
>>>> record per afp session, and then process the user data for flows that
>>>> are complete (we saw the syn and synack to get the ports right).
>>>> 
>>>>   % ra -R /Archive/QoSient/192.168.0.68/2013 -w /tmp/argus.afp.out - tcp and port 548 and ipv4
>>>>   % racluster -r /tmp/argus.afp.out -w - | rauserdata -M printer=encode32 -M dsrs="-agr" - tcp and syn or synack
>>>> 
>>>> Total Records 365 SrcThreshold 10 Dst Threshold 10 
>>>> Service: afpovertcp        tcp port 548   n =   147 src = "0004    000000000000000600000000"  dst = "0104    000000000000000C00000000"  
>>>> Service: afpovertcp        tcp port 548   n =   108 src = "00030001000000000000000200000000"  dst = "0103000100000000000001  00000000"   
>>>> 
>>>> 
>>>> If we didn't aggregate them together, and just looked at each
>>>> status record for a pattern, we get (after a little hand pruning):
>>>> 
>>>>   % rauserdata -M printer=encode32 -r /tmp/argus.afp.out - tcp and syn or synack
>>>> Total Records 11136 SrcThreshold 10 Dst Threshold 10 
>>>> Service: afpovertcp        tcp port 548   n =  9513 src = "0108    000000000000000000000000"  dst = "00      00000000000000  00000000"  
>>>> Service: afpovertcp        tcp port 548   n =   809 src = "00      000000000000    00000000"  dst = "01      0000000000      00000000"
>>>> Service: afpovertcp        tcp port 548   n =   771 src = "0002    00000000000000  00000000"  dst = "                                "
>>>> Service: afpovertcp        tcp port 548   n =    50 src = "00      00000000000000  00000000"  dst = "00        00    000000  000000  "  
>>>> 
>>>> 
>>>> Looks like rule #2 in the second run, matches both rule #1 and #2 in the first run.
>>>> Unfortunately, this doesn't necessarily match your signature, but yours could 
>>>> be used to form this rule (merge #1 from run 1 with your signature, - 2 bytes).
>>>> 
>>>> Service: afpovertcp        tcp port 548   n =     1 src = "0004000100    0000    000000"  dst = "                                "  
>>>> 
>>>> So I would say that this is a decent test of rauserdata(), as it does seem to be
>>>> in the ball park of your efforts.
>>>> 
>>>> Carter
>>>> 
>>>> 
>>>> On Jul 17, 2013, at 10:43 AM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>> 
>>>>> Thanks for your reply again.
>>>>> 
>>>>> If the afp.c definition for "AFP: DSI OpenSession detected." is as noted previously, then the full ArgusEncode32() "output" string would be derived as:
>>>>> where data are assigned:
>>>>> byte offset:    00 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11
>>>>> data:         00 04 00 01 00       00 00             00 00 00 00 04 01
>>>>> 
>>>>> 
>>>>> I think I assumed it to be the opposite placement (from the right, subnetting got me), but this was just a matter of not understanding order ArgusEncode32() handles generation of the string, I assume.
>>>>> 
>>>>> Does the abovelook good?
>>>>> 
>>>>> 
>>>>> Can you also describe how the string "encrypted" is used and when to use it?
>>>>> 
>>>>> Also, what would be a reasonable ("n") weight to give definitions from nDPI or other protocol identification classes?  Or, let me ask that differently... how is the ("n") weight used?  Does raservices() simply consider the weight relative to other lines of the same "service" in the .conf, or is weight considered via some threshold that considers the algorithm used by rauserdata()?
>>>>> I'm "randomly" guessing, if raservices() can say that a byte pattern that is pulled from nDPI is matched, then with about _90% certainty_ it is this service.  How can I express _90% certainty_ with a given "n" value?
>>>>> 
>>>>> Including derived definitions of byte patterns from protocol identification classes plus the machine learning algo of rauserdata (and the user tweaking) will make raservices() much more useful, in my opinion.
>>>>> 
>>>>> 
>>>>> Looking forward to get cracking on this.
>>>>> May also look into seeing if I can generate an raservices().conf from libprotoident (particularly after reading this study http://vbn.aau.dk/files/78068418/report.pdf).
>>>>> 
>>>>> 
>>>>> Thanks Carter,
>>>>> 
>>>>> Matt
>>>>> 
>>>>> On Jul 16, 2013, at 7:01 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>> 
>>>>>> Hey Matt,
>>>>>> So, understand that your efforts can be described as trying to add
>>>>>> to the /usr/local/argus/std.sig file that we provide in the clients distribution.
>>>>>> 
>>>>>> The std.sig file has real signatures for a number of common protocols.
>>>>>> To build your own signatures by hand, you need to understand what
>>>>>> the patterns actually mean.  Lets use one of the signatures for " imap "
>>>>>> as an example:
>>>>>> 
>>>>>> Service: imap   tcp port 143   n = 48745 src = "444F4E450D0A                    " 
>>>>>>                                          dst = "        204F4B2049444C4520636F6D"
>>>>>> 
>>>>>> So the signature provides the service label, in this case "imap".
>>>>>> We expect the signature to be seen in tcp traffic going to port 143.
>>>>>> We processed 48745 imap connections, and we analyzed the first
>>>>>> 16 bytes of the user buffers and found that the source presented a
>>>>>> bit pattern of 0x44F4E450D0A as the first bits in the sampled payload
>>>>>> of the connection, this pattern is ascii "DONE\n". The 7-16 bytes were
>>>>>> variable, and so are not in the signature.
>>>>>> 
>>>>>> The destination in this case had sent payloads where the first 4 bytes
>>>>>> were variable, and bytes 5 -16 were "OK IDLE com".
>>>>>> 
>>>>>> This is the most frequent pattern in the 32 imap payload signatures
>>>>>> that we have, representing about 60% of all user buffers capture for
>>>>>> imap traffic.
>>>>>> 
>>>>>> These signatures are normally generated from argus data of the
>>>>>> service streams of interest, using the program rauserdata().  So one
>>>>>> of the best strategies is to run rauserdata() against your argus logs,
>>>>>> so that it can generate a starter signature file, and then by hand,
>>>>>> improve the signatures until your happy.
>>>>>> 
>>>>>> The signatures that raservices() uses are rather special patterns, that 
>>>>>> represent the persistent bits seen in the user payload samples that argus
>>>>>> captures.  The best results are seen from signatures built from the the first
>>>>>> 16-32 bytes of the entire flow, but there is a great deal of benefit from
>>>>>> analyzing and comparing the samples of payload data that are captured
>>>>>> in the status records.
>>>>>> 
>>>>>> Remember, all data on the wire should be in network order, unless its
>>>>>> unstructured, and then you should treat it as a bit stream, so there isn't
>>>>>> any endian-ness.
>>>>>> 
>>>>>> So for your example I would start with something like this:
>>>>>>    src = "0004000100              "
>>>>>> 
>>>>>> 
>>>>>> Carter
>>>>>> 
>>>>>> 
>>>>>> On Jul 16, 2013, at 10:17 AM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>> 
>>>>>>> Thanks for the reply, Carter.
>>>>>>> 
>>>>>>> Can you provide any assistance in relation to "translating" the values given in nDPI classes to the character based hex strings needed for "src =" and "dst ="?
>>>>>>> 
>>>>>>> 
>>>>>>> For instance, if I take an example from afp.c (https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/afp.c), the following qualifies "AFP: DSI OpenSession detected."
>>>>>>> 
>>>>>>> //from ndpi_protocols.hhttps://svn.ntop.org/svn/ntop/trunk/nDPI/src/include/ndpi_protocols.h
>>>>>>> #define get_u_int16_t (X,O)  (*(u_int16_t *)(((u_int8_t *)X) + O))
>>>>>>> #define get_u_int32_t (X,O)  (*(u_int32_t *)(((u_int8_t *)X) + O))
>>>>>>> 
>>>>>>> get_u_int16_t(packet->payload, 0) == htons(0x0004) &&  //if the 16 bits starting at byte-offset 0 (meaning, bits 0 through 15) of the payload equals the 16 bit little endian "0x0004" and...
>>>>>>> get_u_int16_t(packet->payload, 2) == htons(0x0001) &&  //if the 16 bits starting at byte-offset 2 (meaning, bits 16 through 31) of the payload equals the 16 bit little endian "0x0001" and...
>>>>>>> get_u_int32_t(packet->payload, 4) == 0 && //if the 32 bits starting at byte-offset 4 (meaning bits 32-63) of the payload equals 0 and...
>>>>>>> get_u_int32_t(packet->payload, 8) == htonl(packet->payload_packet_len - 16) && //if the 32 bits at byte-offset 8 (meaning, bits 64-95) are the same as a 32-bit little endian value equal to the size of the packet minus 16 [must be a check of sorts] and...
>>>>>>> get_u_int32_t(packet->payload, 12) == 0 && //if the 32 bits at byte-offset 12 (bits 96-127) equals 0 and...
>>>>>>> get_u_int16_t(packet->payload, 16) == htons(0x0104)) //if the 16 bits at byte-offset 16 (bits 128-144)
>>>>>>> 
>>>>>>> 
>>>>>>> I've commented what I can see as the byte offsets of the given data.
>>>>>>> 
>>>>>>> So, I'd simply like to generate the "src = " and "dst = " from this conditional.
>>>>>>> 
>>>>>>> 
>>>>>>> I had some assistance reviewing ArgusEncode32() and it was explained that it looks at a ptr for binary data and "outputs" that data in a string of hex.
>>>>>>> Knowing that 0x0004, as it is expressed in the nDPI class, is little endian...
>>>>>>> - I believe that if I were to execute ArgusEncode32() with a pointer to data that can be expressed as hex 0x0004, it would output the string "00000004".
>>>>>>> - I could then use this to build an effective "src = " line for an raservices.conf file.
>>>>>>> Are these two assumptions correct?
>>>>>>> 
>>>>>>> With this technique, do you think it's reasonable to generate an raservices.conf from all the conditionals included in the nDPI classes?
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Matt
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Jul 15, 2013, at 7:18 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>> 
>>>>>>>> The ArgusEncode32() printer works on a character basis, so there
>>>>>>>> isn't any notion of big endian or little endian.
>>>>>>>> 
>>>>>>>> The "n=" is how many samples were used to generate the signature.
>>>>>>>> We rank them by "n", as a weight for the probability of encountering
>>>>>>>> that particular pattern.
>>>>>>>> 
>>>>>>>> Carter
>>>>>>>> 
>>>>>>>> On Jul 15, 2013, at 1:15 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Carter,
>>>>>>>>> 
>>>>>>>>> Hope all is well.  Last Thursday I started to look into reversing the
>>>>>>>>> nDPI classes and creating an raservices() conf file from the byte
>>>>>>>>> pattern classification definitions therein.
>>>>>>>>> 
>>>>>>>>> I struggled to understand the c notation, etc, but have arrived on the
>>>>>>>>> question of whether or not ArgusEncode32() takes a little endian data
>>>>>>>>> value as input and "outputs" this data expressed as a string made up
>>>>>>>>> of its value in hex.
>>>>>>>>> 
>>>>>>>>> For instance, if I take a value from afp.c (within nDPI) and see
>>>>>>>>> htons(0x0004), I can assume that when converted with ArgusEncode32(),
>>>>>>>>> the "output" will be "00000004".
>>>>>>>>> 
>>>>>>>>> Out of this, I can then generate the "src=" or "dst=" portions of a
>>>>>>>>> line for an raservices() conf file.
>>>>>>>>> 
>>>>>>>>> Is this correct?
>>>>>>>>> 
>>>>>>>>> Additionally, as for the syntax of the raservices() conf file, what
>>>>>>>>> does the "n=" value mean?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Matt
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130718/2bff9530/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130718/2bff9530/attachment.bin>


More information about the argus mailing list