rahisto dialog
Carter Bullard
carter at qosient.com
Tue Nov 21 17:43:00 EST 2006
Gentle people,
I added rahisto() to the distribution and it is an important program
for doing
so many things that it deserves a description, and hopefully some
interest
from the list.
rahisto() generates histograms, or more formally, frequency distribution
tables, that can be used for behavioral baselining, trend analysis,
anomaly detection, and QoS analysis, just to name a few uses.
The basic concept, is that any metric that argus contains, or can
derive,
is a candidate for frequency distribution analysis. The duration of
email
sessions, the average round trip time of TCP connection establishment,
the frequency of connections, or pkts or bytes, against say, TTL, or TOS
byte values, and of course the frequency of connections to TCP or UDP
port numbers, are just some of the uses of rahisto(). In conjunction
with racluster() or rabins(), you can get the frequency of interesting
aggregates, such as active hosts/sessions per min/hour/day/month....
etc....
The reason you want the frequency distribution, rather than just say
means and standard deviation, is because many Internet things are
not normally distributed, they are bimodal, whatever-modal, like
the multiple peaks that are in the graph of the day for ARP. When they
are, the frequency distributions help you to realize them, pretty
quickly.
DNS server response is a great example. Local vs non-local DNS
lookups, now that's something that we all expect to be bimodally
distributed
at least. It is clear that responses that are cached, in memory are
HUGELY
faster than those that are resolved recursively by a server. On my
network, there are 4 fundamental DNS response times, or peaks, and
rahisto() makes it pretty easy to see this, and statistically,
testing if
a distribution is different is not hard to do.
Now, what rahisto() does is pretty interesting. You specify the value
you want to tally against, the number of bins, and the size of each
bin, and rahisto() merges records that match a particular bin together,
just like racluster() does. This is cool, in that you get argus records
as an end product. and the aggregated objects have a lot of potential
information. As an example, IP addresses when they are merged
together,
preserve as much of the prefix that matches (longest prefix match).
So,
if you're lucky, rahisto() may show that a particular peak in a complex
frequency distribution are all from/to the same subnet. Because merging
is field preserving, you may discover that all the records in a bin
have the
same ttl, or TCP dst port number, mac address, whatever.
So here are some examples. A pretty simple analysis is ping response
time. Argus generates bi-directional records for ping request/response
volleys, and so the duration is the RTT for the ping. I'll use a file
that was generated by running ra() against all the QoSient data for
October
that got to the outside world:
ra -R external_probe/20006/10 -w /tmp/ra.echo.out - echo
this gets all the pings, those that did and didn't get answers, and
those
that came into QoSient, and those that were initiated from within
QoSient.
Ok, so as an example, I'm interested in a specific range of pings, so
lets
checkout the pings that fell in the range 200mSec - 250mSec, to see
what's up. This is really arbitrary just for the purposes of the
example:
rahisto -H dur 10:200-250m -r /tmp/ra.echo.out -s daddr dttl
N = 21 mean = 0.225263 stddev = 0.008700 max = 0.247605 min 0.213547
0 2.000000e-01-2.050000e-01 0 0.0000% 0.0000%
1 2.050000e-01-2.100000e-01 0 0.0000% 0.0000%
Class Interval Freq Rel.Freq
Cum.Freq DstAddr dTtl
2 2.100000e-01-2.150000e-01 3 14.2857%
14.2857% 10.95.192.1 238
3 2.150000e-01-2.200000e-01 6 28.5714%
42.8571% 10.95.0.0 238
4 2.200000e-01-2.250000e-01 4 19.0476%
61.9048% 10.95.0.0 0
5 2.250000e-01-2.300000e-01 1 4.7619%
66.6667% 10.95.1.83 238
6 2.300000e-01-2.350000e-01 0 0.0000% 66.6667%
7 2.350000e-01-2.400000e-01 5 23.8095%
90.4762% 10.72.80.36 242
8 2.400000e-01-2.450000e-01 1 4.7619%
95.2381% 10.72.80.36 242
9 2.450000e-01-2.500000e-01 1 4.7619%
100.0000% 10.72.80.36 241
OK, so I had rahisto() print the dst host address of the merged
records and the dst ttl
of the flows, and we get of course some interesting results,
( modified the addresses slight to
protect what/.who ever). So the clustered data does show trends. The
ttls (which are
the ttls from the packets, not the actual distance, just subtract
this number from 255, and
you'll get the hop counts). So, you can see where rahisto()'s
aggregation strategies
preserve what data they can, I suspect that the 10.95.0.0 are the
result of pings going
to 10.95.192.1 and 10.95.1.83 overlap, and so we get the longest
prefix match as a result.
Where the dTtl is 0, the ttl in two records differed, so we zero'd
out the field.
Pretty cool? I hope you think so.
OK, comments?
Carter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20061121/6f9b0116/attachment.html>
More information about the argus
mailing list