rahisto dialog

Tue Nov 21 17:43:00 EST 2006

Gentle people,
I added rahisto() to the distribution and it is an important program  
for doing
so many things that it deserves a description, and  hopefully some  
interest
from the list.

rahisto() generates histograms, or more formally, frequency distribution
tables, that can be used for behavioral baselining, trend analysis,
anomaly detection, and QoS analysis, just to name a few uses.

The basic concept, is that any metric that argus contains, or can  
derive,
is a candidate for frequency distribution analysis.  The duration of  
email
sessions, the average round trip time of TCP connection establishment,
the frequency of connections, or pkts or bytes, against say, TTL, or TOS
byte values, and of course the frequency of connections to TCP or UDP
port numbers, are just some of the uses of rahisto().   In conjunction
with racluster() or rabins(), you can get the frequency of interesting
aggregates, such as active hosts/sessions per min/hour/day/month....
etc....

The reason  you want the frequency distribution, rather than just say
means and standard deviation, is because many Internet things are
not normally distributed, they are bimodal, whatever-modal, like
the multiple peaks that are in the graph of the day for ARP.   When they
are, the frequency distributions help you to realize them, pretty  
quickly.

DNS server response is a great example.  Local vs non-local DNS
lookups, now that's something that we all expect to be bimodally  
distributed
at least.  It is clear that responses that are cached, in memory are  
HUGELY
faster than those that are resolved recursively by a server.   On my
network, there are 4 fundamental DNS response times, or peaks, and
rahisto() makes it pretty easy to see this, and statistically,  
testing if
a distribution is different is not hard to do.

Now, what rahisto() does is pretty interesting.  You specify the value
you want to tally against, the number of bins, and the size of each
bin, and rahisto() merges records that match a particular bin together,
just like racluster() does.  This is cool, in that you get argus records
as an end product. and the aggregated objects have a lot of potential
information.   As an example, IP addresses when they are merged  
together,
preserve as much of the prefix that matches (longest prefix match).    
So,
if you're lucky, rahisto() may show that a particular peak in a complex
frequency distribution are all from/to the same subnet.  Because merging
is field preserving, you may discover that all the records in a bin  
have the
same ttl, or TCP dst port number, mac address, whatever.

So here are some examples.   A pretty simple analysis is ping response
time.   Argus generates bi-directional records for ping request/response
volleys, and so the duration is the RTT for the ping.  I'll use a file
that was generated by running ra() against all the QoSient data for  
October
that got to the outside world:

    ra -R external_probe/20006/10 -w /tmp/ra.echo.out - echo

this gets all the pings, those that did and didn't get answers, and  
those
that came into QoSient, and those that were initiated from within  
QoSient.

Ok, so as an example, I'm interested in a specific range of pings, so  
lets
checkout the pings that fell in the range 200mSec - 250mSec, to see
what's up.   This is really arbitrary just for the purposes of the  
example:

rahisto -H dur 10:200-250m -r /tmp/ra.echo.out -s daddr dttl
N = 21  mean = 0.225263  stddev = 0.008700  max = 0.247605  min 0.213547
      0   2.000000e-01-2.050000e-01        0     0.0000%      0.0000%
      1   2.050000e-01-2.100000e-01        0     0.0000%      0.0000%
Class            Interval             Freq    Rel.Freq      
Cum.Freq       DstAddr       dTtl
      2   2.100000e-01-2.150000e-01        3    14.2857%      
14.2857%        10.95.192.1  238
      3   2.150000e-01-2.200000e-01        6    28.5714%      
42.8571%          10.95.0.0  238
      4   2.200000e-01-2.250000e-01        4    19.0476%      
61.9048%          10.95.0.0    0
      5   2.250000e-01-2.300000e-01        1     4.7619%      
66.6667%         10.95.1.83  238
      6   2.300000e-01-2.350000e-01        0     0.0000%     66.6667%
      7   2.350000e-01-2.400000e-01        5    23.8095%      
90.4762%        10.72.80.36  242
      8   2.400000e-01-2.450000e-01        1     4.7619%      
95.2381%        10.72.80.36  242
      9   2.450000e-01-2.500000e-01        1     4.7619%     
100.0000%        10.72.80.36  241

OK, so I had rahisto() print the dst host address of the merged  
records and the dst ttl
of the flows, and we get of course some interesting results,  
( modified the addresses slight to
protect what/.who ever). So the clustered data does show trends.  The  
ttls (which are
the ttls from the packets, not the actual distance, just subtract  
this number from 255, and
you'll get the hop counts).  So, you can see where rahisto()'s  
aggregation strategies
preserve what data they can, I suspect that the 10.95.0.0 are the  
result of pings going
to 10.95.192.1 and 10.95.1.83 overlap, and so we get the longest  
prefix match as a result.
Where the dTtl is 0, the ttl in two records differed, so we zero'd  
out the field.

Pretty cool?  I hope you think so.
OK, comments?

Carter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20061121/6f9b0116/attachment.html>