Wikified ! ( but rabins question remains ) [Re: Counting flows by time interval in argus
Stéphane Peters
stephane.peters at forem.be
Wed Apr 9 12:39:53 EDT 2008
Hello Carter,
you are right !
As often it is a case of understanding how the parameters work...
I knew (hoped) this shouldn't be a bug in rabins, and you have confirmed
the fact.
Here are some variations I have just done to show that :
> % rabins -r stripped.ra -M soft time 1d -m srcid -s stime trans | tot
> | tail -2
> 7 - *7,333k*
> % rabins -r stripped.ra -M soft time 1h -m srcid -s stime trans | tot
> | tail -2
> 49 56 *7,340k*
> % rabins -r stripped.ra -M soft time 10m -m srcid -s stime trans |
> tot | tail -2
> 259 311 *7,343k*
> % rabins -r stripped.ra -M nomodify time 10m -m srcid -s stime trans
> | tot | tail -2
> 252 300 *7,333k*
> % ra -nr anostripped2.ra -p 0 -s "-flgs -state +dur" - dur gt 100
> 06:00:05. tcp 100.0.1.1.21830 <?>
> 1.0.2.1.22 14781 8556654 11809
> 06:00:27. icmp 100.0.2.1 <->
> 1.0.2.1 720 53280 21541
> 06:10:38. udp 1.0.2.1.123 <->
> 100.0.3.1.123 54 4860 20399
> 09:15:34. tcp 1.0.2.1.35209 ->
> 197.0.1.1.50522 36 23576 164
> 09:19:32. tcp 100.0.1.1.36248 ->
> 1.0.2.1.22 7295 3727610 9601
> 10:29:20. tcp 1.0.2.1.45452 ->
> 197.0.1.4.80 35 15162 293
> 11:38:06. tcp 100.0.4.1.43875 ->
> 1.0.2.1.22 67 9237 306
> 11:41:28. tcp 100.0.4.1.43906 ->
> 1.0.2.1.50523 77 21193 243
> 11:45:49. tcp 100.0.4.1.43979 ->
> 1.0.2.1.22 1250 217985 769
So, by choosing a single bin of a day, or using '-M nomodify' you get
the same number of flow starts.
Then we can see that, as you have said, there were some long-running
flows, and surely they span over serveral bins.
I have started a new section on the NSM Wiki with your explanations;
this is the kind of in-depth explanations useful from time to time.
It still has to be cleaned, but what do you think of it ?
8 Some discussions (
http://www.vorant.com/nsmwiki/Argus#Some_discussions )
<http://www.vorant.com/nsmwiki/Argus#Some_discussions>
8.1 rabins -M nomodify (
http://www.vorant.com/nsmwiki/Argus#rabins_-M_nomodify )
<http://www.vorant.com/nsmwiki/Argus#rabins_-M_nomodify>
As for the man record, it seems still strange to me, impossible to get
rid of the man record :
> % racount -nr stripped.ra - not man
> racount records total_pkts src_pkts dst_pkts
> total_bytes src_bytes dst_bytes
> sum 7334 107877 54241 53636
> 38921003 6407386 32513617
> % racount -nr stripped.ra - man
> racount records total_pkts src_pkts dst_pkts
> total_bytes src_bytes dst_bytes
> sum 1 0 0 0
> 0 0 0
> % racount -nr stripped.ra - not ip
> racount records total_pkts src_pkts dst_pkts
> total_bytes src_bytes dst_bytes
> sum 1 0 0 0
> 0 0 0
>
Carter Bullard a écrit :
> Hey Stephane,
> racount() just counts records, and there is always one management
> record at the beginning of any stream. In a well formed stream,
> there is also a closing management record that indicates why the
> stream/file was done. Some times you need to add the filter "not man"
> to make sure that racount() doesn't give you a +1 or sometimes a +2 value.
>
> Rabins() increasing the flow count can be correct for this type of
> analysis.
>
> In Nicks example, rabins() is reporting the number of flows during a
> specific 10m period. Flows that cross hard time boundaries, do exist
> in the multiple time periods. So if the question is:
>
> "what is the flow count in any given time period"
>
> splitting flows across time boundaries is the correct thing to do.
> There are lots of reasons to ask this question. How many
> concurrent flows are on the wire and how does that change
> over time is a very sensitive test for lots of network phenomenon.
> I like to track the natural beacons on the wire (network management
> stations pinging around, switches doing end system inventory
> by arping every IP address in the subnet every 300 secs), and its
> easy using this type of stat, as computers are pretty good clocks.
>
> I suspect that either 10 flows cross the 10 minute boundaries, in
> this run, or there is a single flow that spans 10 time boundaries, or
> something in between.
>
> However, if the question is instead:
>
> "what is the flow initiation count in any given time period"
>
> Well, then you only want a single flow counted once, in the time bin
> when it started. To do this you don't want to modify/split the flow
> records, so use this option:
>
> rabins -M nomodify
>
> And it will only tally the flows that start in that time period.
>
>
> A statistical note:
>
> The time boundary, relative to the actual real time of the flows
> can be random. Flows for the most part, don't really care that
> its 2pm somewhere, so the time "bins" that we create start at
> a random time relative to flow records. But this is not always
> the case. Actually if a human is not behind the generation of
> flows, then the flows are being created by a clock, because
> computers are really just digital alarm clocks in an abstract way.
>
> This is called "flow clocking".
>
> This is important in understanding the phenomenon that you
> point out. If you lengthen the time size of the bins from 10m
> to say 1h, the probability of a flow crossing a time boundary
> will go down in a probabilistic manner if flows start randomly.
> But if your flows are sync'd by a clock of some kind (like
> cron launching a job), then you will find that as the bin gets longer,
> all of a sudden, it goes from ( > N ) to ( == N).
>
> Carter
>
>
> On Apr 9, 2008, at 7:12 AM, Stéphane Peters wrote:
>> Hello Nick,
>>
>> this is a nice example, clean, direct to the point!
>>
>> It has been put on the wiki :
>> http://www.vorant.com/nsmwiki/Argus#Examples
>> Feel free to add more stuff to the wiki.
>>
>> I still find a difference in the counts; perhaps does it come from
>> rabins ?
>> More investigation to come ...
>>
>> Here are some small comparisons of a sample of records : rabins has
>> added 10 trans in 7333 records
>>> file=stripped.ra
>>> % racount -r $file
>>> racount records total_pkts src_pkts dst_pkts
>>> total_bytes src_bytes dst_bytes
>>> sum *7334* 107877 54241 53636
>>> 38921003 6407386 32513617
>>> % ra -nr $file -s trans - ip | uniq -c
>>> 1 Trans
>>> *7333* 1
>>> % ra -nr $file -s trans pkts spkts dpkts bytes sbytes dbytes - ip | tot
>>> *7,333*k 107,877k 54,241k 53,636k 38,921m 6,40739m
>>> 32,5136m
>>> % rabins -r $file -M soft time 10m -m srcid -s stime trans | tot
>>> 259 311 * 7,343*k
>> I don't know why rabins has counted 7343 trans instead of 7333,
>> even why does racount talk about 7334 instead of 7333.
>>
>>
>>
>>
>> Nick Diel a écrit :
>>> I wanted to add in a solution Carter just showed me so it was part
>>> of this thread if anyone was searching.
>>>
>>> This example assumes you have already merged status flow records, so
>>> records = flows, if not add another pipe of racluster.
>>>
>>> rastrip -r $file -M -agr -w - | rabins -M soft time 10m -m srcid -s
>>> stime trans -c , -F raTime.conf > flowcounts.csv
>>>
>>> raTime.conf contents (you could also add this to your rarc file):
>>> RA_TIME_FORMAT="%H:%M"
>>>
>>> If you have multiple collectors, you can have rabins merge on
>>> something else such as proto if you are filtering on tcp.
>>>
>>> Nick
>>>
>>>
>>> On Wed, Mar 26, 2008 at 1:04 PM, Stéphane Peters
>>> <stephane.peters at forem.be <mailto:stephane.peters at forem.be>> wrote:
>>>
>>> Hello,
>>>
>>> Here is an example of counting flows I have just used,
>>> to compare print flows seen by argus (filtered on port 9100)
>>> with print requests seen by our batch server (found in a csv file).
>>> Both lists have been feed in a spreadsheet to make a nice
>>> graphic comparison.
>>>
>>> If someone sees a better way to do this within ra* clients
>>> without the unixes filters,
>>> I will be happy to see how to do it.
>>>
>>> Example saved on the wiki:
>>>
>>> Count flows by groups of 10 minutes : show only the flow
>>> start times, cut after the 10ths of minutes, add a trailing
>>> zero and delete heading spaces to show a nice HH:MM line,
>>> count them, invert columns, insert a delimitor. Ready to be
>>> feed in your favorite spreadsheet.
>>> ra -s stime -p 0 -nr $file |\
>>> cut -c -7 |\
>>> uniq -c | \
>>> sed -e 's/$/0/' \
>>> -e 's/^ *//' \
>>> -e 's/\(.*\) *\(.*\)/\2,\1/' > flowcounts.csv
>>>
>>>
Regards,
--
Stephane.Peters at forem.be, Postmaster at forem.be
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080409/4825435a/attachment.html>
More information about the argus
mailing list