Wikified ! ( but rabins question remains ) [Re: Counting flows by time interval in argus

Carter Bullard carter at qosient.com
Wed Apr 9 09:41:17 EDT 2008


Hey Stephane,
racount() just counts records, and there is always one management
record at the beginning  of any stream.  In a well formed stream,
there is also a closing management record that indicates why the
stream/file was done.  Some times you need to add the filter "not man"
to make sure that racount() doesn't give you a +1 or sometimes a +2  
value.

Rabins() increasing the flow count can be correct for this type of  
analysis.

In Nicks example, rabins() is reporting the number of flows during a
specific 10m period.  Flows that cross hard time boundaries, do exist
in the multiple time periods.  So if the question is:

    "what is the flow count in any given time period"

splitting flows across time boundaries is the correct thing to do.
There are lots of reasons to ask this question.  How many
concurrent flows are on the wire and how does that change
over time is a very sensitive test for lots of network phenomenon.
I like to track the natural beacons on the wire (network management
stations pinging around,  switches doing end system inventory
by arping every IP address in the subnet every 300 secs), and its
easy using this type of stat, as computers are pretty good clocks.

I suspect that either 10 flows cross the 10 minute boundaries, in
this run, or there is a single flow that spans 10 time boundaries, or
something in between.

However, if the question is instead:

    "what is the flow initiation count in any given time period"

Well, then you only want a single flow counted once, in the time bin
when it started.  To do this you don't want to modify/split the flow
records, so use this option:

    rabins -M nomodify

And it will only tally the flows that start in that time period.


A statistical note:

The time boundary, relative to the actual real time of the flows
can be random.  Flows for the most part, don't really care that
its 2pm somewhere, so the time "bins" that we create start at
a random time relative to flow records.  But this is not always
the case.  Actually if a human is not behind the generation of
flows, then the flows are being created by a clock, because
computers are really just digital alarm clocks in an abstract way.

This is called "flow clocking".

This is important in understanding the phenomenon that you
point out.  If you lengthen the time size of the bins from 10m
to say 1h,  the probability of a flow crossing a time boundary
will go down in a probabilistic manner if flows start randomly.
But if your flows are sync'd by a clock of some kind (like
cron launching a job), then you will find that as the bin gets longer,
all of a sudden, it goes from ( > N ) to ( == N).

Carter


On Apr 9, 2008, at 7:12 AM, Stéphane Peters wrote:

> Hello Nick,
>
> this is a nice example, clean, direct to the point!
>
> It has been put on the wiki :
>     http://www.vorant.com/nsmwiki/Argus#Examples
> Feel free to add more stuff to the wiki.
>
> I still find a difference in the counts; perhaps does it come from  
> rabins ?
> More investigation to come ...
>
> Here are some small comparisons of a sample of records : rabins has  
> added 10 trans in 7333 records
>> file=stripped.ra
>> % racount -r $file
>> racount   records     total_pkts     src_pkts       dst_pkts        
>> total_bytes        src_bytes          dst_bytes
>>     sum   7334        107877         54241          53636           
>> 38921003           6407386            32513617
>> % ra -nr $file  -s trans - ip | uniq -c
>>       1  Trans
>>    7333      1
>> % ra -nr $file  -s trans pkts spkts dpkts bytes sbytes dbytes - ip  
>> | tot
>> 7,333k  107,877k        54,241k 53,636k 38,921m 6,40739m         
>> 32,5136m
>> % rabins -r $file -M soft time 10m -m srcid -s stime trans | tot
>> 259     311     7,343k
> I don't know why rabins has counted 7343 trans instead of 7333,
> even why does racount talk about 7334 instead of 7333.
>
>
>
>
> Nick Diel a écrit :
>>
>> I wanted to add in a solution Carter just showed me so it was part  
>> of this thread if anyone was searching.
>>
>> This example assumes you have already merged status flow records,  
>> so records = flows, if not add another pipe of racluster.
>>
>> rastrip -r $file -M -agr -w - | rabins -M soft time 10m -m srcid -s  
>> stime trans -c , -F raTime.conf > flowcounts.csv
>>
>> raTime.conf contents (you could also add this to your rarc file):
>> RA_TIME_FORMAT="%H:%M"
>>
>> If you have multiple collectors, you can have rabins merge on  
>> something else such as proto if you are filtering on tcp.
>>
>> Nick
>>
>>
>> On Wed, Mar 26, 2008 at 1:04 PM, Stéphane Peters <stephane.peters at forem.be 
>> > wrote:
>> Hello,
>>
>> Here is an example of counting flows I have just used,
>> to compare print flows seen by argus (filtered on port 9100)
>> with print requests seen by our batch server (found in a csv file).
>> Both lists have been feed in a spreadsheet to make a nice graphic  
>> comparison.
>>
>> If someone sees a better way to do this within ra* clients without  
>> the unixes filters,
>> I will be happy to see how to do it.
>>
>> Example saved on the wiki:
>> Count flows by groups of 10 minutes : show only the flow start  
>> times, cut after the 10ths of minutes, add a trailing zero and  
>> delete heading spaces to show a nice HH:MM line, count them, invert  
>> columns, insert a delimitor.  Ready to be feed in your favorite  
>> spreadsheet.
>>  ra -s stime -p 0 -nr $file |\
>>    cut -c -7 |\
>>    uniq -c | \
>>    sed -e 's/$/0/' \
>>        -e 's/^ *//' \
>>        -e 's/\(.*\) *\(.*\)/\2,\1/' > flowcounts.csv
>>
> Regards,
> -- 
> Stephane.Peters at forem.be, Postmaster at forem.be

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080409/eda34c31/attachment.html>


More information about the argus mailing list