ArgusEncode32() accepts little endian?

Matt Brown matthewbrown at gmail.com
Thu Jul 18 13:03:16 EDT 2013


No problem.


Thanks for the info.

So, clearly, out of this way too huge thread, it's quite clear that the
proper way to generate data to be used as an raservices().conf, is to use
rauserdata(), then modify the outcome as needed... that little can be
directly derived from nDPI and/or libprotoident directly(?)


If I understand the previous threads about rauserdata() correctly:

1) `rauserdata -M printer="encode32" -r /var/opt/argus/2013-07-15/* >
~/temp.sig`
2) Review the ~/temp.sig and make modifications to make it as efficient as
possible.
3) `raservices -f ~/temp.sig -r
/var/opt/argus/2013-07-15/argus_17\:00\:00.gz - tcp`

I recieve a segmentation fault, with 3.0.7.11 clients (`#define
ARGUSMAXSIGFILE 0x80000` is present in ./include/argus_client.h) and argus
3.0.6.1.


So, you are implying that if I'm going to dedicate some time, the time
would best be spent generating traffic for argus to eat and spit to a file
to be analyzed by rauserdata(), which I would analyze?  What is your lower
"n" tolerance for line elimination?


Thanks,

Matt

On Jul 18, 2013, at 11:55 AM, Carter Bullard <carter at qosient.com> wrote:

Hey  Matt,
Comments inline.  Not trying to beat you up too bad.
Carter



On Jul 18, 2013, at 11:38 AM, Matt Brown <matthewbrown at gmail.com> wrote:

Carter,

So what else is nDPI looking at that isn't content?

As with libprotoident... packet contents, src/dst ports, flow tracking,
packet lengths, more...

Is it not Deep Packet Inspection?

I think the missing piece of raservices() versus nDPI is nDPI's use of flow
tracking and robust understanding of "applications" ("What is robust
understanding?" see bittorrent below).  Clearly the algo that seeks
patterns (facilitated by rauserdata()) is a very viable method of
application detection.  Simply versus nDPI (163 definitions) or
libprotoident (223 definitions), the std.sig has a smaller count of sigs
(84).


Argus does do flow tracking.  At least I think it does. Provides direction,
packet size information, and content.  Not sure that it needs anything else?

I don't think devo'ing argus so that it can generate packet data for other
applications is a good idea.  223 protocols is really chicken feed, compared
to other tools, and rauserdata() can characterize any protocol, you just
have to run it against your argus data, of your applications/protocols.  I
have had signature files with 3K+ applications.  Easy to do with the tools.



I began looking at the classes alphabetically, and after reading
bittorrent.c and dns.c, I decided on my conclusion of (near) futility in
the strategy of converting nDPI classes to an raservices().conf_ as I was
simply trying to generate byte patterns from given byte patterns_.


nDPI utilizes the following features for protocol/application detection:
- packet contents
- considers flow/session tracking (including directional tracking)
- considers size of packets
- considers packet order in flow

For instance, reviewing dns.c (
https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/dns.c).

ndpi_search_dns():

1) Is packet UDP or TCP?

2) Source port or dest port is 53, and the packet payload length is that of
a DNS header.
3) Pull out the following protocol stuff:
- flags
- transaction ID
- queries
- answer resource records
- authority resource records
- any more resource records
- is it a query (checks based on if the flags are set to be a query)
- return code

4) Check a lot of the protocol stuff (including additional parsing of the
packet byte structure) to decide whether we're dealing with a question
("request") or a answer ("response"); then it is DNS.


So you don't need a response to know that its DNS.  You shouldn't need to
parse
the DNS transaction to know that its DNS, especially since libprotoident
claims
it can do tracking with only 4 bytes of payload.

So you're describing something that maybe interesting, but its not high
performance
protocol characterization and identification.


5) This class is also used to help identify "subprotocols" (like "Google",
"Facebook", etc).


The decision for Bittorrent is even more complex (
https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/bittorrent.c):
- tracks the flow of events (conditional checks "second packet" of flow)
- considers packet length (is this possible with raservices()?)
- considers ASCII content and performs string matches


I suppose that all of these things can match patterns, but not directly
derived from the nDPI classes themselves, which was what I was hoping I
would be able to do (literally static direct data pattern generation).

If I search for the string "hton" or "ntoh", I get some hits in a variety
of classes, but very few classes are as clear cut to build the
raservices().conf lines as the original example afp.c.


So, as I mentioned earlier, hton() and ntoh() should not be used for this
type of analysis, as those are only used for 16, 32, 64 and 128 bit
constructs.
How do you know that the next 32 bits are an int?  If you knew that, then
you
already know what the protocol is.



Since this is the case, I do believe it would be better to do one of two
things:

1) build an ra client that fires the packet payloads into the nDPI engine.
 Would this provide for flow tracking, etc?  Why would I do this over just
using ntop with nDPI along side an argus probe since I would basically have
to... run ntop with nDPI along side an argus probe? :D
2) run nDPI along side an argus, and perform the necessary grunt work of
application-to-byte-pattern correlatation to pull out the byte patterns for
use within an raservices().conf. This will increase the protocol sigs that
can be distributed with the argus-client package.


What type of things can raservices() handle in the area?
- ASCII pattern to byte pattern conversion = okay
- payload length = not length testing exactly, but as long as we can match
a byte pattern, it'd be okay
- flow tracking? (I can't really define what this is, but I would consider
some of the bittorrent detection to be this)


So I'm just not sure if raservices() is robust enough to handle a
"straight" type conversion (what I was attempting); but we can use the
application detection tools (nDPI and libprotoident for instance) to detect
the applications, then determine the common byte patterns for the
applications (with rauserdata()) for use with raservices().  That's a whole
lotta work!

What do you think of this last strategy?


Have you used rauserdata()?



Thanks again,

Matt



On Jul 17, 2013, at 4:07 PM, Carter Bullard <carter at qosient.com> wrote:

So what else is nDPI looking at that isn't content?
Is it not Deep Packet Inspection?

Carter



On Jul 17, 2013, at 3:12 PM, Matt Brown <matthewbrown at gmail.com> wrote:

Thanks carter.  I was about to write that the deeper I got my head into the
nDPI classes the clearer it became that a large majority of the
"protocols/application" identifiers considered many more aspects of the
conversations than simply a byte pattern that would be at all useful with
raservices().

The best thing for me to do at this point would be to run nDPI and
libprotoident along side the argus probe, shuffle through the flows to
determine exactly what "protocol/application" were tagged by nDPI and
libprotoident, then run rauserdata() targeting these flows.

Anyone want to assist? :)

Effort seems redundant at that point, but well worth it if you don't want
to maintain those flow engines.

Ironically, right now, raservices() is currently segfaulting on me, but
that's for another thread.


Thanks for your efforts and thorough explanations,

Matt


On Jul 17, 2013, at 2:59 PM, Carter Bullard <carter at qosient.com> wrote:

Hey Matt,
So, I have a lot of AFP, assuming that AFP is apple file protocol, over tcp
or over udp,…, port 548.  Well, not a huge amount but some, and I have
a lot of argus records that have these captured AFP sessions.  So lets check
your raservices() signature.

Here are the commands I used against the argus repository I have on my
primary apple client at QoSient World Headquarters.  Grab all the afp over
tcp
status records.  Aggregate the records, using racluster() to get a single
record per afp session, and then process the user data for flows that
are complete (we saw the syn and synack to get the ports right).

  % ra -R /Archive/QoSient/192.168.0.68/2013 -w /tmp/argus.afp.out - tcp
and port 548 and ipv4
  % racluster -r /tmp/argus.afp.out -w - | rauserdata -M printer=encode32
-M dsrs="-agr" - tcp and syn or synack

Total Records 365 SrcThreshold 10 Dst Threshold 10
Service: afpovertcp        tcp port 548   n =   147 src = "0004
 000000000000000600000000"  dst = "0104    000000000000000C00000000"
Service: afpovertcp        tcp port 548   n =   108 src =
"00030001000000000000000200000000"  dst = "0103000100000000000001
 00000000"


If we didn't aggregate them together, and just looked at each
status record for a pattern, we get (after a little hand pruning):

  % rauserdata -M printer=encode32 -r /tmp/argus.afp.out - tcp and syn or
synack
Total Records 11136 SrcThreshold 10 Dst Threshold 10
Service: afpovertcp        tcp port 548   n =  9513 src = "0108
 000000000000000000000000"  dst = "00      00000000000000  00000000"
Service: afpovertcp        tcp port 548   n =   809 src = "00
 000000000000    00000000"  dst = "01      0000000000      00000000"
Service: afpovertcp        tcp port 548   n =   771 src = "0002
 00000000000000  00000000"  dst = "                                "
Service: afpovertcp        tcp port 548   n =    50 src = "00
 00000000000000  00000000"  dst = "00        00    000000  000000  "


Looks like rule #2 in the second run, matches both rule #1 and #2 in the
first run.
Unfortunately, this doesn't necessarily match your signature, but yours
could
be used to form this rule (merge #1 from run 1 with your signature, - 2
bytes).

Service: afpovertcp        tcp port 548   n =     1 src = "0004000100
 0000    000000"  dst = "                                "

So I would say that this is a decent test of rauserdata(), as it does seem
to be
in the ball park of your efforts.

Carter


On Jul 17, 2013, at 10:43 AM, Matt Brown <matthewbrown at gmail.com> wrote:

Thanks for your reply again.

If the afp.c definition for "AFP: DSI OpenSession detected." is as noted
previously, then the full ArgusEncode32() "output" string would be derived
as:
where data are assigned:
byte offset:    00 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11
data:         00 04 00 01 00       00 00             00 00 00 00 04 01


I think I assumed it to be the opposite placement (from the right,
subnetting got me), but this was just a matter of not understanding order
ArgusEncode32() handles generation of the string, I assume.

Does the abovelook good?


Can you also describe how the string "encrypted" is used and when to use it?

Also, what would be a reasonable ("n") weight to give definitions from nDPI
or other protocol identification classes?  Or, let me ask that
differently... how is the ("n") weight used?  Does raservices() simply
consider the weight relative to other lines of the same "service" in the
.conf, or is weight considered via some threshold that considers the
algorithm used by rauserdata()?
I'm "randomly" guessing, if raservices() can say that a byte pattern that
is pulled from nDPI is matched, then with about _90% certainty_ it is this
service.  How can I express _90% certainty_ with a given "n" value?

Including derived definitions of byte patterns from protocol identification
classes plus the machine learning algo of rauserdata (and the user
tweaking) will make raservices() much more useful, in my opinion.


Looking forward to get cracking on this.
May also look into seeing if I can generate an raservices().conf from
libprotoident (particularly after reading this study http
://vbn.aau.dk/files/78068418/report.pdf<http://vbn.aau.dk/files/78068418/report.pdf>
).


Thanks Carter,

Matt

On Jul 16, 2013, at 7:01 PM, Carter Bullard <carter at qosient.com> wrote:

Hey Matt,
So, understand that your efforts can be described as trying to add
to the /usr/local/argus/std.sig file that we provide in the clients
distribution.

The std.sig file has real signatures for a number of common protocols.
To build your own signatures by hand, you need to understand what
the patterns actually mean.  Lets use one of the signatures for " imap "
as an example:

Service: imap   tcp port 143   n = 48745 src = "444F4E450D0A
     "
                                         dst = "
 204F4B2049444C4520636F6D"

So the signature provides the service label, in this case "imap".
We expect the signature to be seen in tcp traffic going to port 143.
We processed 48745 imap connections, and we analyzed the first
16 bytes of the user buffers and found that the source presented a
bit pattern of 0x44F4E450D0A as the first bits in the sampled payload
of the connection, this pattern is ascii "DONE\n". The 7-16 bytes were
variable, and so are not in the signature.

The destination in this case had sent payloads where the first 4 bytes
were variable, and bytes 5 -16 were "OK IDLE com".

This is the most frequent pattern in the 32 imap payload signatures
that we have, representing about 60% of all user buffers capture for
imap traffic.

These signatures are normally generated from argus data of the
service streams of interest, using the program rauserdata().  So one
of the best strategies is to run rauserdata() against your argus logs,
so that it can generate a starter signature file, and then by hand,
improve the signatures until your happy.

The signatures that raservices() uses are rather special patterns, that
represent the persistent bits seen in the user payload samples that argus
captures.  The best results are seen from signatures built from the the
first
16-32 bytes of the entire flow, but there is a great deal of benefit from
analyzing and comparing the samples of payload data that are captured
in the status records.

Remember, all data on the wire should be in network order, unless its
unstructured, and then you should treat it as a bit stream, so there isn't
any endian-ness.

So for your example I would start with something like this:
   src = "0004000100              "


Carter


On Jul 16, 2013, at 10:17 AM, Matt Brown <matthewbrown at gmail.com> wrote:

Thanks for the reply, Carter.

Can you provide any assistance in relation to "translating" the values
given in nDPI classes to the character based hex strings needed for "src ="
and "dst ="?


For instance, if I take an example from afp.c (
https://svn.ntop.org/svn/ntop/trunk/nDPI/src/lib/protocols/afp.c), the
following qualifies "AFP: DSI OpenSession detected."

//from ndpi_protocols.h
https://svn.ntop.org/svn/ntop/trunk/nDPI/src/include/ndpi_protocols.h
#define get_u_int16_t (X,O)  (*(u_int16_t *)(((u_int8_t *)X) + O))
#define get_u_int32_t (X,O)  (*(u_int32_t *)(((u_int8_t *)X) + O))

get_u_int16_t(packet->payload, 0) == htons(0x0004) &&  //if the 16 bits
starting at byte-offset 0 (meaning, bits 0 through 15) of the payload
equals the 16 bit little endian "0x0004" and...
get_u_int16_t(packet->payload, 2) == htons(0x0001) &&  //if the 16 bits
starting at byte-offset 2 (meaning, bits 16 through 31) of the payload
equals the 16 bit little endian "0x0001" and...
get_u_int32_t(packet->payload, 4) == 0 && //if the 32 bits starting at
byte-offset 4 (meaning bits 32-63) of the payload equals 0 and...
get_u_int32_t(packet->payload, 8) == htonl(packet->payload_packet_len - 16)
&& //if the 32 bits at byte-offset 8 (meaning, bits 64-95) are the same as
a 32-bit little endian value equal to the size of the packet minus 16 [must
be a check of sorts] and...
get_u_int32_t(packet->payload, 12) == 0 && //if the 32 bits at byte-offset
12 (bits 96-127) equals 0 and...
get_u_int16_t(packet->payload, 16) == htons(0x0104)) //if the 16 bits at
byte-offset 16 (bits 128-144)


I've commented what I can see as the byte offsets of the given data.

So, I'd simply like to generate the "src = " and "dst = " from this
conditional.


I had some assistance reviewing ArgusEncode32() and it was explained that
it looks at a ptr for binary data and "outputs" that data in a string of
hex.
Knowing that 0x0004, as it is expressed in the nDPI class, is little
endian...
- I believe that if I were to execute ArgusEncode32() with a pointer to
data that can be expressed as hex 0x0004, it would output the string "
00000004".
- I could then use this to build an effective "src = " line for an
raservices.conf file.
Are these two assumptions correct?

With this technique, do you think it's reasonable to generate an
raservices.conf from all the conditionals included in the nDPI classes?


Thanks,

Matt



On Jul 15, 2013, at 7:18 PM, Carter Bullard <carter at qosient.com> wrote:

The ArgusEncode32() printer works on a character basis, so there
isn't any notion of big endian or little endian.

The "n=" is how many samples were used to generate the signature.
We rank them by "n", as a weight for the probability of encountering
that particular pattern.

Carter

On Jul 15, 2013, at 1:15 PM, Matt Brown <matthewbrown at gmail.com> wrote:

Carter,


Hope all is well.  Last Thursday I started to look into reversing the

nDPI classes and creating an raservices() conf file from the byte

pattern classification definitions therein.


I struggled to understand the c notation, etc, but have arrived on the

question of whether or not ArgusEncode32() takes a little endian data

value as input and "outputs" this data expressed as a string made up

of its value in hex.


For instance, if I take a value from afp.c (within nDPI) and see

htons(0x0004), I can assume that when converted with ArgusEncode32(),

the "output" will be "00000004".


Out of this, I can then generate the "src=" or "dst=" portions of a

line for an raservices() conf file.


Is this correct?


Additionally, as for the syntax of the raservices() conf file, what

does the "n=" value mean?



Thanks,


Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130718/c80c03c9/attachment.html>


More information about the argus mailing list