ArgusEncode32() accepts little endian?

Carter Bullard carter at qosient.com
Wed Jul 17 16:07:26 EDT 2013


So what else is nDPI looking at that isn't content?
Is it not Deep Packet Inspection?

Carter



On Jul 17, 2013, at 3:12 PM, Matt Brown <matthewbrown at gmail.com> wrote:

> Thanks carter.  I was about to write that the deeper I got my head into the nDPI classes the clearer it became that a large majority of the "protocols/application" identifiers considered many more aspects of the conversations than simply a byte pattern that would be at all useful with raservices().
> 
> The best thing for me to do at this point would be to run nDPI and libprotoident along side the argus probe, shuffle through the flows to determine exactly what "protocol/application" were tagged by nDPI and libprotoident, then run rauserdata() targeting these flows.
> 
> Anyone want to assist? :)
> 
> Effort seems redundant at that point, but well worth it if you don't want to maintain those flow engines.
> 
> Ironically, right now, raservices() is currently segfaulting on me, but that's for another thread.
> 
> 
> Thanks for your efforts and thorough explanations,
> 
> Matt
> 
> 
> On Jul 17, 2013, at 2:59 PM, Carter Bullard <carter at qosient.com> wrote:
> 
>> Hey Matt,
>> So, I have a lot of AFP, assuming that AFP is apple file protocol, over tcp
>> or over udp,…, port 548.  Well, not a huge amount but some, and I have
>> a lot of argus records that have these captured AFP sessions.  So lets check
>> your raservices() signature.
>> 
>> Here are the commands I used against the argus repository I have on my
>> primary apple client at QoSient World Headquarters.  Grab all the afp over tcp
>> status records.  Aggregate the records, using racluster() to get a single
>> record per afp session, and then process the user data for flows that
>> are complete (we saw the syn and synack to get the ports right).
>> 
>>   % ra -R /Archive/QoSient/192.168.0.68/2013 -w /tmp/argus.afp.out - tcp and port 548 and ipv4
>>   % racluster -r /tmp/argus.afp.out -w - | rauserdata -M printer=encode32 -M dsrs="-agr" - tcp and syn or synack
>> 
>> Total Records 365 SrcThreshold 10 Dst Threshold 10 
>> Service: afpovertcp        tcp port 548   n =   147 src = "0004    000000000000000600000000"  dst = "0104    000000000000000C00000000"  
>> Service: afpovertcp        tcp port 548   n =   108 src = "00030001000000000000000200000000"  dst = "0103000100000000000001  00000000"   
>> 
>> 
>> If we didn't aggregate them together, and just looked at each
>> status record for a pattern, we get (after a little hand pruning):
>> 
>>   % rauserdata -M printer=encode32 -r /tmp/argus.afp.out - tcp and syn or synack
>> Total Records 11136 SrcThreshold 10 Dst Threshold 10 
>> Service: afpovertcp        tcp port 548   n =  9513 src = "0108    000000000000000000000000"  dst = "00      00000000000000  00000000"  
>> Service: afpovertcp        tcp port 548   n =   809 src = "00      000000000000    00000000"  dst = "01      0000000000      00000000"
>> Service: afpovertcp        tcp port 548   n =   771 src = "0002    00000000000000  00000000"  dst = "                                "
>> Service: afpovertcp        tcp port 548   n =    50 src = "00      00000000000000  00000000"  dst = "00        00    000000  000000  "  
>> 
>> 
>> Looks like rule #2 in the second run, matches both rule #1 and #2 in the first run.
>> Unfortunately, this doesn't necessarily match your signature, but yours could 
>> be used to form this rule (merge #1 from run 1 with your signature, - 2 bytes).
>> 
>> Service: afpovertcp        tcp port 548   n =     1 src = "0004000100    0000    000000"  dst = "                                "  
>> 
>> So I would say that this is a decent test of rauserdata(), as it does seem to be
>> in the ball park of your efforts.
>> 
>> Carter
>> 
>> 
>> On Jul 17, 2013, at 10:43 AM, Matt Brown <matthewbrown at gmail.com> wrote:
>> 
>>> Thanks for your reply again.
>>> 
>>> If the afp.c definition for "AFP: DSI OpenSession detected." is as noted previously, then the full ArgusEncode32() "output" string would be derived as:
>>> where data are assigned:
>>> byte offset:    00 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11
>>> data:         00 04 00 01 00       00 00             00 00 00 00 04 01
>>> 
>>> 
>>> I think I assumed it to be the opposite placement (from the right, subnetting got me), but this was just a matter of not understanding order ArgusEncode32() handles generation of the string, I assume.
>>> 
>>> Does the abovelook good?
>>> 
>>> 
>>> Can you also describe how the string "encrypted" is used and when to use it?
>>> 
>>> Also, what would be a reasonable ("n") weight to give definitions from nDPI or other protocol identification classes?  Or, let me ask that differently... how is the ("n") weight used?  Does raservices() simply consider the weight relative to other lines of the same "service" in the .conf, or is weight considered via some threshold that considers the algorithm used by rauserdata()?
>>> I'm "randomly" guessing, if raservices() can say that a byte pattern that is pulled from nDPI is matched, then with about _90% certainty_ it is this service.  How can I express _90% certainty_ with a given "n" value?
>>> 
>>> Including derived definitions of byte patterns from protocol identification classes plus the machine learning algo of rauserdata (and the user tweaking) will make raservices() much more useful, in my opinion.
>>> 
>>> 
>>> Looking forward to get cracking on this.
>>> May also look into seeing if I can generate an raservices().conf from libprotoident (particularly after reading this study http://vbn.aau.dk/files/78068418/report.pdf).
>>> 
>>> 
>>> Thanks Carter,
>>> 
>>> Matt
>>> 
>>> On Jul 16, 2013, at 7:01 PM, Carter Bullard <carter at qosient.com> wrote:
>>> 
>>>> Hey Matt,
>>>> So, understand that your efforts can be described as trying to add
>>>> to the /usr/local/argus/std.sig file that we provide in the clients distribution.
>>>> 
>>>> The std.sig file has real signatures for a number of common protocols.
>>>> To build your own signatures by hand, you need to understand what
>>>> the patterns actually mean.  Lets use one of the signatures for " imap "
>>>> as an example:
>>>> 
>>>> Service: imap   tcp port 143   n = 48745 src = "444F4E450D0A                    " 
>>>>                                          dst = "        204F4B2049444C4520636F6D"
>>>> 
>>>> So the signature provides the service label, in this case "imap".
>>>> We expect the signature to be seen in tcp traffic going to port 143.
>>>> We processed 48745 imap connections, and we analyzed the first
>>>> 16 bytes of the user buffers and found that the source presented a
>>>> bit pattern of 0x44F4E450D0A as the first bits in the sampled payload
>>>> of the connection, this pattern is ascii "DONE\n". The 7-16 bytes were
>>>> variable, and so are not in the signature.
>>>> 
>>>> The destination in this case had sent payloads where the first 4 bytes
>>>> were variable, and bytes 5 -16 were "OK IDLE com".
>>>> 
>>>> This is the most frequent pattern in the 32 imap payload signatures
>>>> that we have, representing about 60% of all user buffers capture for
>>>> imap traffic.
>>>> 
>>>> These signatures are normally generated from argus data of the
>>>> service streams of interest, using the program rauserdata().  So one
>>>> of the best strategies is to run rauserdata() against your argus logs,
>>>> so that it can generate a starter signature file, and then by hand,
>>>> improve the signatures until your happy.
>>>> 
>>>> The signatures that raservices() uses are rather special patterns, that 
>>>> represent the persistent bits seen in the user payload samples that argus
>>>> captures.  The best results are seen from signatures built from the the first
>>>> 16-32 bytes of the entire flow, but there is a great deal of benefit from
>>>> analyzing and comparing the samples of payload data that are captured
>>>> in the status records.
>>>> 
>>>> Remember, all data on the wire should be in network order, unless its
>>>> unstructured, and then you should treat it as a bit stream, so there isn't
>>>> any endian-ness.
>>>> 
>>>> So for your example I would start with something like this:
>>>>    src = "0004000100              "
>>>> 
>>>> 
>>>> Carter
>>>> 
>>>> 
>>>> On Jul 16, 2013, at 10:17 AM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>> 
>>>>> Thanks for the reply, Carter.
>>>>> 
>>>>> Can you provide any assistance in relation to "translating" the values given in nDPI classes to the character based hex strings needed for "src =" and "dst ="?
>>>>> 
>>>>> 
>>>>> For instance, if I take an example from afp.c (https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/afp.c), the following qualifies "AFP: DSI OpenSession detected."
>>>>> 
>>>>> //from ndpi_protocols.hhttps://svn.ntop.org/svn/ntop/trunk/nDPI/src/include/ndpi_protocols.h
>>>>> #define get_u_int16_t (X,O)  (*(u_int16_t *)(((u_int8_t *)X) + O))
>>>>> #define get_u_int32_t (X,O)  (*(u_int32_t *)(((u_int8_t *)X) + O))
>>>>> 
>>>>> get_u_int16_t(packet->payload, 0) == htons(0x0004) &&  //if the 16 bits starting at byte-offset 0 (meaning, bits 0 through 15) of the payload equals the 16 bit little endian "0x0004" and...
>>>>> get_u_int16_t(packet->payload, 2) == htons(0x0001) &&  //if the 16 bits starting at byte-offset 2 (meaning, bits 16 through 31) of the payload equals the 16 bit little endian "0x0001" and...
>>>>> get_u_int32_t(packet->payload, 4) == 0 && //if the 32 bits starting at byte-offset 4 (meaning bits 32-63) of the payload equals 0 and...
>>>>> get_u_int32_t(packet->payload, 8) == htonl(packet->payload_packet_len - 16) && //if the 32 bits at byte-offset 8 (meaning, bits 64-95) are the same as a 32-bit little endian value equal to the size of the packet minus 16 [must be a check of sorts] and...
>>>>> get_u_int32_t(packet->payload, 12) == 0 && //if the 32 bits at byte-offset 12 (bits 96-127) equals 0 and...
>>>>> get_u_int16_t(packet->payload, 16) == htons(0x0104)) //if the 16 bits at byte-offset 16 (bits 128-144)
>>>>> 
>>>>> 
>>>>> I've commented what I can see as the byte offsets of the given data.
>>>>> 
>>>>> So, I'd simply like to generate the "src = " and "dst = " from this conditional.
>>>>> 
>>>>> 
>>>>> I had some assistance reviewing ArgusEncode32() and it was explained that it looks at a ptr for binary data and "outputs" that data in a string of hex.
>>>>> Knowing that 0x0004, as it is expressed in the nDPI class, is little endian...
>>>>> - I believe that if I were to execute ArgusEncode32() with a pointer to data that can be expressed as hex 0x0004, it would output the string "00000004".
>>>>> - I could then use this to build an effective "src = " line for an raservices.conf file.
>>>>> Are these two assumptions correct?
>>>>> 
>>>>> With this technique, do you think it's reasonable to generate an raservices.conf from all the conditionals included in the nDPI classes?
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Matt
>>>>> 
>>>>> 
>>>>> 
>>>>> On Jul 15, 2013, at 7:18 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>> 
>>>>>> The ArgusEncode32() printer works on a character basis, so there
>>>>>> isn't any notion of big endian or little endian.
>>>>>> 
>>>>>> The "n=" is how many samples were used to generate the signature.
>>>>>> We rank them by "n", as a weight for the probability of encountering
>>>>>> that particular pattern.
>>>>>> 
>>>>>> Carter
>>>>>> 
>>>>>> On Jul 15, 2013, at 1:15 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>> 
>>>>>>> Carter,
>>>>>>> 
>>>>>>> Hope all is well.  Last Thursday I started to look into reversing the
>>>>>>> nDPI classes and creating an raservices() conf file from the byte
>>>>>>> pattern classification definitions therein.
>>>>>>> 
>>>>>>> I struggled to understand the c notation, etc, but have arrived on the
>>>>>>> question of whether or not ArgusEncode32() takes a little endian data
>>>>>>> value as input and "outputs" this data expressed as a string made up
>>>>>>> of its value in hex.
>>>>>>> 
>>>>>>> For instance, if I take a value from afp.c (within nDPI) and see
>>>>>>> htons(0x0004), I can assume that when converted with ArgusEncode32(),
>>>>>>> the "output" will be "00000004".
>>>>>>> 
>>>>>>> Out of this, I can then generate the "src=" or "dst=" portions of a
>>>>>>> line for an raservices() conf file.
>>>>>>> 
>>>>>>> Is this correct?
>>>>>>> 
>>>>>>> Additionally, as for the syntax of the raservices() conf file, what
>>>>>>> does the "n=" value mean?
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Matt
>>>>>>> 
>>>>>> 
>>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130717/74fd507d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130717/74fd507d/attachment.bin>


More information about the argus mailing list