Argus/racluster fundamentals

Nick Diel ndiel at engr.colostate.edu
Fri Mar 21 13:33:16 EDT 2008


Carter,

Great info.  It appears filters are quite critical when merging on 
things other than flow key.  Of course now I have more questions.

Primarily how does Argus choose which Tcp DSR record to use.  I know you 
can specify options with Argus to tell it to generate additional 
records, and I am sure Argus uses different DSRs for singleton and 
non-singleton records.  I am just curious what some of the exact methods 
Argus uses.

I am more of thinking out loud here, but at least for TCP why does Argus 
look for packet loss by watching sequence numbers?  Shouldn't the only 
time this method shows a loss (over multiple status flows) is when the 
connection is going to close abnormally very soon (TCP out of sync, one 
end down)?  I guess retransmitted packets seems like it should be the 
only indication of loss?  I am sure you have thought about this a whole 
lot; I am just having a hard time understanding where this method would 
be beneficial.  Maybe this method accounts for retransmitted packets too?

Nick


Carter Bullard wrote:
> Hey Nick,
> When you merge two records together, the aggregation engine goes through
> each DSR (data specific record) in the two argus records, and compares 
> them
> for applicability/consistency etc... If the 2 corresponding DSRs are 
> incompatible,
> the aggregation engine will simply throw that DSR away. 
>
> All the TCP information, base sequence numbers, acks, roundtrip times,
> window sizes, retransmissions, etc .... are all contained in the
> ARGUS_NETWORK_DSR, which holds protocol specific information.
> If you merge an ICMP flow with a TCP flow, the aggregator just tosses
> the ARGUS_NETWORK_DSR away, because the DSR has different
> meanings and are not compatible, and you lose the TCP specific 
> information.
> This can happen when the flow key is just "-m saddr daddr", and so
> flows between A and B, regardless of protocol, get merged together.
>
> If one argus record has, say, ethernet addresses in the ARGUS_MAC_DSR,
> but the other record to be merged doesn't have an ARGUS_MAC_DSR, for
> whatever reason, we'll toss the ARGUS_MAC_DSR when generating the
> resultant merged record.  Now, there are conditions where we are 
> "preserving"
> and we keep the DSR, rather than throw it away.  This happens with
> ARGUS_AGR_DSR's.  Each DSR has its own set of rules for what to do.
>
> However, when the DSR's are compatible, but different, you can get
> some interesting results.  There are 3 types of ARGUS_NETWORK_DSRs, for
> TCP data:  ArgusTcpInit, ArgusTcpStatus, and ArgusTcpPerf, and only one
> has loss statistics.  If you merge an ArgusTcpInit DSR (which only has 
> base sequence
> numbers, flags and roundtrip times)  with a ArgusTcpPerf DSR (which has
> everything), you are suppose to get an ArgusTcpPerf DSR, with some slight
> mods to the fields.  (state values get or'd, flags get or'd, base 
> sequence numbers are
> checked to make sure they are the same, and if not the result is adjusted,
> total bytes transmitted are summed, etc....)  The source code is 
> pretty dense
> in this area, so there is a lot to talk about.
>
> With loss, there is such a thing as negative loss.  We see this with
> protocols like ESP and RTP quite often, when packets get out of order.
> Argus see's sequence number 23, then 25, and we need to report the
> flow, and so we report a loss of 1 packet.  Well,  the next packet
> that argus see's after sending the status report,  is packet number 24 and
> then 26 and then 27.   Well, we need to report that 24 showed up, and so
> when we generate the next flow status record, we report a loss of -1. 
>   Later,
> when you merge the two status flow records together, the loss becomes 
> zero.
>
> You won't see that too often with TCP, but you can get that kind of
> behavior, especially when the Far Status Interval is below 1 second.
>
> I'm thinking that this situation is caused by a bug, where we merge an
> ArgusTcpInit and an ArgusTcpPerf DSR together, and fail to redefine
> the DSR to ArgusTcpPerf, but leave it as ArgusTcpInit, which of course
> doesn't/can't have any retransmission stats.  The newest client code 
> that is
> on the server (refreshed yesterday) does have some addition logic
> to make it less likely to have this problem, but I have to double/triple
> check to see what is actually going on.  Having data that generates the
> problem, makes that much easier.
>
> This is a long topic, so keep sending questions, and we'll get a something
> written down that may make some sense.
>
> Carter
>
>
> On Mar 18, 2008, at 12:30 PM, Nick Diel wrote:
>
>> Carter,
>>
>> First thanks for everything you have done.  Second thanks for all 
>> this great info, it as been extremely helpful as I learn Argus.  We 
>> will need a wiki page just for all in the info you have given so far.
>>
>> Hopefully Stew can anonymize the data, so you can shed some light on 
>> what is going on.
>>
>> Can you tell me/the rest of the list a little bit more how racluster 
>> handles Ip attributes and TCP attributes.  For instance, if racluster 
>> is merging based on flow keys, will it attempt to find additional 
>> retransmitted packets.  For example if a singleton is actually a 
>> retransmitted packet for another non-singleton, would racluster 
>> detect that and increase the loss count after they are merged together?
>>
>> Nick
>>
>> Carter Bullard wrote:
>>> Gentlemen,
>>> Well, racluster() does modify the IP attributes and TCP attributes based
>>> on the records that are being merged together.   Because you are 
>>> modifying
>>> the flow key, and then merging data together, some data maybe ignored.
>>>
>>> As an example, If you merge a record that is a singleton with a 
>>> non-singleton,
>>> your resulting merged result may/could retain some singleton 
>>> properties.  A
>>> singleton is a flow with only one packet.  One of the properties of 
>>> a singleton
>>> is that it doesn't have any duration, and it also doesn't have any loss.
>>> Now, if you merge a singleton with a non-singleton you get a 
>>> non-singleton
>>> as the result, so losing things like loss would, of course, be a bug.
>>>
>>> The best solution is to see if you can ranonymize() the data, and 
>>> get the
>>> same graph.  You could share that "primitive" data?
>>>
>>> Primitive data is the set pf original flow records directly from 
>>> argus().
>>>
>>> What do you think?
>>>
>>> Carter
>>>
>>>
>>>
>>> On Mar 18, 2008, at 12:02 AM, Stewart Gray wrote:
>>>
>>>> That's right, I'll show the example I'm working with:
>>>>  
>>>> ra -m proto -s loss -r packet-dump-2008-03-18_08\:28.arg - tcp | 
>>>> awk '{total=total+$1;} END {print total;}'
>>>> 33244
>>>>  
>>>> racluster -m proto -s loss -r packet-dump-2008-03-18_08\:28.arg - tcp
>>>> 0
>>>>  
>>>> Unfortunately I'm not able to distribute the data I'm working with 
>>>> - it's customers flow logs. I'll see if I can replicate the issues 
>>>> @ home so I can provide something to work with.
>>>>  
>>>> Cheers,
>>>>  
>>>> Stew
>>>> ------------------------------------------------------------------------
>>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>> *Sent:* Tuesday, 18 March 2008 4:53 p.m.
>>>> *To:* Carter Bullard
>>>> *Cc:* Stewart Gray; Argus
>>>> *Subject:* Re: [ARGUS] [Argus] Re: Packet Loss with racluster
>>>>
>>>> Carter,
>>>>
>>>> What you are saying makes sense (I think), but I think there is 
>>>> something else going on here.
>>>>
>>>> Stew had a 2 minute file.  If he used ra to look at just this file 
>>>> he would see individual records that had positive values for loss 
>>>> packet count.  Then he used racluster to merge all status flow 
>>>> records and it reported 0 loss packets.  I think Stew was doing 
>>>> this one file at a time.
>>>>
>>>> Basically if a single file (regardless how it was created) has any 
>>>> status flows with a positive packet loss count, shouldn't racluster 
>>>> be able to report this total for this file?
>>>>
>>>> ra -s loss -r argus.arg - tcp | awk '{total=total+$1;} END {print 
>>>> total;}'  >0
>>>> racluster -m proto -s loss -r argus.arg - tcp  = 0
>>>>
>>>> I may be missing something, but this was how I interpreted Stew's 
>>>> problem.
>>>>
>>>> Nick
>>>>
>>>> Carter Bullard wrote:
>>>>> Hey Guys,
>>>>> There are a lot of things going on that can affect the 
>>>>> "distribution" of numbers
>>>>> on a time series graph, when using flow data.  Flows are not fixed 
>>>>> length samples
>>>>> of network activity, and so you have to do some statistical mods 
>>>>> to make the data
>>>>> generally useful.    Programs like rasplit() and rabins() are 
>>>>> critical to distributing
>>>>> load, rate, packet numbers, loss numbers, jitter, interpackt 
>>>>> arrival times, etc...
>>>>> correctly into timed bins.  Without the use of either rasplit() or 
>>>>> rabins(), which
>>>>> are split/aggregate tools, you can end up with flows that are 
>>>>> longer than the
>>>>> time interval its suppose to represent, which skews the data in 
>>>>> weird ways, and
>>>>> can generate bins with no data in them.
>>>>>
>>>>> Loss doesn't have to be constant, and so the drop outs may 
>>>>> actually be real.
>>>>> And the there are no guarantees that there are actually tcp 
>>>>> connections during
>>>>> those intervals (no TCP, no loss), so we have to look at the data 
>>>>> to see if there
>>>>> is anything wrong.
>>>>>
>>>>> Remember, flows from argus() are as long as the 
>>>>> ARGUS_FAR_STATUS_INTERVAL.
>>>>> A flow that starts at 1:59:59.999999, will be tallied in the 
>>>>> 1:58:00 - 2:00:00 bin, even
>>>>> though its duration could significantly extend well into the 
>>>>> 2:00:00-2:02:00 interval.
>>>>>
>>>>> The trick is to split the data into strict time slots, and then 
>>>>> aggregating those slots.
>>>>> rabins() is very good at this, that is why its at the heart of 
>>>>> ragraph().
>>>>>
>>>>> If I can get some of the data used to generate the graph in the 
>>>>> email, I can
>>>>> see if using rabins() would remove the drop outs.
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>>
>>>>> On Mar 17, 2008, at 8:40 PM, Stewart Gray wrote:
>>>>>
>>>>>> I just feed the values into cacti, it's a base metric I can use 
>>>>>> for spotting anomalies. Even if it's not 100% accurate, the 
>>>>>> accuracy should be pretty consistent even if argus 
>>>>>> inflates/deflates the figure slightly on files which have been 
>>>>>> sliced up.
>>>>>>  
>>>>>> I'm running this argus instance on a busy section of our network 
>>>>>> and there is a constant flow of between 80-140mb/s. I ran the 
>>>>>> rate/load/loss command and got got:
>>>>>>  
>>>>>> 17949.785637 94528448 0
>>>>>>  
>>>>>> You can see the blips this morning. The file is actually split 
>>>>>> every 2mins on this particular box.
>>>>>>  
>>>>>> <Outlook.jpg>
>>>>>>  
>>>>>> It's a bit unusual, if I run 'ra -m proto -s loss -r argus.arg - 
>>>>>> tcp' there are quite a number of losses/retransmits. Might be an 
>>>>>> issue with how racluster is aggregating these?
>>>>>>  
>>>>>> Stew
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>>>> *Sent:* Tuesday, 18 March 2008 12:10 p.m.
>>>>>> *To:* Stewart Gray
>>>>>> *Cc:* Argus
>>>>>> *Subject:* [Argus] Re: Packet Loss with racluster
>>>>>>
>>>>>> Stew,
>>>>>>
>>>>>> I think the first question is what are you using this number 
>>>>>> for.  If you are just using it as an indicator of congestion or 
>>>>>> other network problems then the 5 minute boundary will most 
>>>>>> likely not be a problem.
>>>>>>
>>>>>> I believe Argus just counts the number of retransmitted packets 
>>>>>> to get a loss/drop count, I don't think it is doing any triple 
>>>>>> duplicate ack or tcp timeout checks (if I am wrong, someone 
>>>>>> please say so).  Since retransmissions will occur in a time 
>>>>>> window of a few seconds, you should capture most retransmitted 
>>>>>> packets in your 5 minute boundaries.  So even if a flow cross 
>>>>>> that boundary, you still have a good chance of counting 
>>>>>> retransmitted packets correctly.
>>>>>>
>>>>>> For cases you are receiving a count of 0, I would look at packet 
>>>>>> rate and bit rate, it is possible the link just doesn't have much 
>>>>>> traffic on it at that time. racluster -m proto -s rate load loss 
>>>>>> -r argus.arg - tcp
>>>>>>
>>>>>> Though I did notice something unusual on my end.  The command I 
>>>>>> gave you, should be a strong estimate, but doesn't account for 
>>>>>> retransmitted packets over status flow boundaries within the file 
>>>>>> (though same argument above applies).  So to get an exact count 
>>>>>> on the file (assuming racluster reanalyzes the status flow 
>>>>>> records for retransmissions) you would need something like: 
>>>>>> racluster -r argus.arg -w - - tcp | racluster -m proto -s loss -r 
>>>>>> - (first merge status flow records, then count retransmitted 
>>>>>> packets).  Though this is the output I get:
>>>>>>
>>>>>> racluster -m proto -s loss -r argus.out - tcp
>>>>>>      62521
>>>>>> racluster -r argus.out -w - - tcp | racluster -m proto -s loss -r -
>>>>>>      60047
>>>>>>
>>>>>> At a minimum I would expect the numbers to stay the same, no 
>>>>>> retransmitted packets crossed any status flows or racluster 
>>>>>> doesn't try to find any new retransmitted packets.  The number 
>>>>>> going down doesn't make any sense to me.  Maybe someone can 
>>>>>> explain what is going on to me.
>>>>>>
>>>>>> Nick[
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Stewart Gray wrote:
>>>>>>> Hey Guys,
>>>>>>>  
>>>>>>> How does racluster handle argus files which have been 
>>>>>>> periodically split, when producing packet loss statistics? My 
>>>>>>> monitoring machine rotates the argus file every 5minutes. When 
>>>>>>> using the following command, how skewed are the figures going to 
>>>>>>> be as a result of having an incomplete argus file (ie 
>>>>>>> connections that were current when the log file was rotated).
>>>>>>>  
>>>>>>> I'm also note than sometimes the resulting figure is 0. It only 
>>>>>>> seems to do this in about 1/10 argus files I run the command at.
>>>>>>>  
>>>>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>>>> 0
>>>>>>>  
>>>>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>>>> 33036
>>>>>>> Any ideas?
>>>>>>>  
>>>>>>> Cheers,
>>>>>>>  
>>>>>>> Stew
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>>>>> *Sent:* Wednesday, 12 March 2008 10:24 a.m.
>>>>>>> *To:* Stewart Gray
>>>>>>> *Cc:* Argus
>>>>>>> *Subject:* Re: [ARGUS] Cheat sheet premiere
>>>>>>>
>>>>>>> How about:
>>>>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>>>>
>>>>>>> This should merge all records based on protocol (in this case 
>>>>>>> only tcp because of the filter) and then print the loss column 
>>>>>>> of all merged records.
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> Stewart Gray wrote:
>>>>>>>> awesome, That's a really good start. I've already been playing 
>>>>>>>> with a few of the options I hadn't toyed with before :)
>>>>>>>>  
>>>>>>>> Is there an easy way to generate a raw count of packets 
>>>>>>>> loss/retransmitted rather than having it graphed?
>>>>>>>>  
>>>>>>>> I figure we start with:
>>>>>>>>  
>>>>>>>> racluster -s loss -r argus.arg -w -
>>>>>>>>  
>>>>>>>> How are the figured totaled? Do we pipe it to rasort or ra?
>>>>>>>>  
>>>>>>>> Thanks,
>>>>>>>>  
>>>>>>>> Stewart
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> *From:* Stéphane Peters [mailto:stephane.peters at forem.be]
>>>>>>>> *Sent:* Saturday, 8 March 2008 11:06 a.m.
>>>>>>>> *To:* Carter Bullard
>>>>>>>> *Cc:* Stewart Gray; Argus
>>>>>>>> *Subject:* Re: Re: [ARGUS] Cheat sheet premiere
>>>>>>>>
>>>>>>>> Hi Carter,
>>>>>>>>
>>>>>>>> I would love to see such a sheet in the distribution,
>>>>>>>> and I also was hoping that you could check,
>>>>>>>> if those examples made sense or were appropriate.
>>>>>>>> So please go on !
>>>>>>>>
>>>>>>>>
>>>>>>>> Some cosmetic work could be done too;
>>>>>>>> for example to use everywhere some "standard" parameters like 
>>>>>>>> this one :
>>>>>>>>     file=argus-eth1.out
>>>>>>>>     ra -r $file
>>>>>>>> so it is easy to paste the line "as is".
>>>>>>>> without forgetting the shell escapes ( \$srcid) like in:
>>>>>>>>     rasplit -S $argushost  -M 1d -w 
>>>>>>>> /path/argus-\$srcid.%Y.%m.%d.log
>>>>>>>>
>>>>>>>> By the way, as another example given to the list, here are 3 
>>>>>>>> scripts I use.
>>>>>>>> The PATH vars permit to have a nicer ps(1) output.
>>>>>>>>
>>>>>>>> start-argus
>>>>>>>>> #!/bin/sh
>>>>>>>>> interf=eth1
>>>>>>>>> PATH=/sbin ifconfig $interf | grep UP || PATH=/sbin ifconfig 
>>>>>>>>> $interf up
>>>>>>>>> PATH=/usr/local/sbin argus -d -i $interf -e `hostname` -P 561 
>>>>>>>>> -U128 -mRS 30 -w argus-eth1.out
>>>>>>>>
>>>>>>>> rotate:
>>>>>>>>> #!/bin/sh
>>>>>>>>>
>>>>>>>>> # Rotates server log files, without affecting users who may be
>>>>>>>>> # connected to the server.
>>>>>>>>>
>>>>>>>>> # This can be run as a cron script
>>>>>>>>>
>>>>>>>>> DATE=`date +%Y-%m%d-%H%M`
>>>>>>>>> LOGS='argus-eth1.out'
>>>>>>>>>
>>>>>>>>>  for i in $LOGS; do
>>>>>>>>>    if [ -f $i ]; then
>>>>>>>>>      mv $i $i.$DATE
>>>>>>>>>      gzip -9 $i.$DATE
>>>>>>>>>    fi
>>>>>>>>>  done
>>>>>>>>
>>>>>>>> rotate-daily
>>>>>>>>> #!/bin/sh
>>>>>>>>> ./rotate
>>>>>>>>> sleep 60 # sometimes the preceding command finishes too early
>>>>>>>>> echo ./rotate-daily | at 0000 > /tmp/rotate-daily.log
>>>>>>>>
>>>>>>>> I use at(1) instead of cron(8) to cut the files closer to 
>>>>>>>> midnight.,
>>>>>>>> but rastream(1)'s extended "-w" option seems promising.
>>>>>>>> A better solution could be to use argus(8) to preprocess the 
>>>>>>>> flows,
>>>>>>>> and rastream(1). to write, "rotate" and compress the files.
>>>>>>>> Another thread, perhaps.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    
>>>>>>>>
>>>>>>>>
>>>>>>>> Carter Bullard wrote :
>>>>>>>>> Hey Stephane,
>>>>>>>>> This is great!!!!  I'll put this in the distribution, if you 
>>>>>>>>> don't mind!!!!
>>>>>>>>> And I'll also go through it to make sure that any changes in the
>>>>>>>>> code actually don't break this, and I can add some of the ones
>>>>>>>>> that I do.
>>>>>>>>>
>>>>>>>>> So Russell is asking for a wiki, and we already have one at:
>>>>>>>>>
>>>>>>>>> http://www.vorant.com/nsmwiki/index.php?title=Argus
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Carter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 7, 2008, at 2:24 PM, Stéphane Peters wrote:
>>>>>>>>>
>>>>>>>>>> Hi Stewart,
>>>>>>>>>>
>>>>>>>>>> I also think that a cheat sheet would be nice !
>>>>>>>>>> Here is a good occasion to show mine...
>>>>>>>>>>
>>>>>>>>>> Please note, most of the stuff has been collected right from 
>>>>>>>>>> this argus list,
>>>>>>>>>> so hopefully, you shouldn't browse all the (numerous) past 
>>>>>>>>>> messages.
>>>>>>>>>>
>>>>>>>>>> Any suggestions ?
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>> flow filtering on certain port range:
>>>>>>>>>>    ra -r file - dst port \( gt 1024 and lt 2048 \)
>>>>>>>>>> (...)
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stewart Gray a écrit :
>>>>>>>>>>> awesome, that's more like what I was after :) Thanks for your help
>>>>>>>>>>> again. 
>>>>>>>>>>>
>>>>>>>>>>> As I mentioned earlier, I reckon it'd be neat to have some sort of cheat
>>>>>>>>>>> sheet for doing common tasks. I bet there's lot's of stuff you know that
>>>>>>>>>>> others don't, having written the application yourself. I don't know what
>>>>>>>>>>> I don't know!
>>>>>>>>>>>   
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> -- 
>>>>>>>>>> Stephane.Peters at forem.be, Postmaster at forem.be
>>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> -- 
>>>>>>>> Stephane.Peters at forem.be
>>>>>>>> #####################################################################################
>>>>>>>> Important: This electronic message and attachments (if any) are 
>>>>>>>> confidential and may be legally privileged. If you are not the 
>>>>>>>> intended recipient do not copy, disclose or use the contents in 
>>>>>>>> any way. Please let us know by return e-mail immediately and 
>>>>>>>> then destroy this message.
>>>>>>>> #####################################################################################
>>>>>>>
>>>>>>> #####################################################################################
>>>>>>> Important: This electronic message and attachments (if any) are 
>>>>>>> confidential and may be legally privileged. If you are not the 
>>>>>>> intended recipient do not copy, disclose or use the contents in 
>>>>>>> any way. Please let us know by return e-mail immediately and 
>>>>>>> then destroy this message.
>>>>>>> #####################################################################################
>>>>>>
>>>>>> #####################################################################################
>>>>>> Important: This electronic message and attachments (if any) are 
>>>>>> confidential and may be legally privileged. If you are not the 
>>>>>> intended recipient do not copy, disclose or use the contents in 
>>>>>> any way. Please let us know by return e-mail immediately and then 
>>>>>> destroy this message.
>>>>>> #####################################################################################
>>>>>
>>>>> Carter Bullard
>>>>> CEO/President
>>>>> QoSient, LLC
>>>>> 150 E. 57th Street Suite 12D
>>>>> New York, New York 10022
>>>>>
>>>>> +1 212 588-9133 Phone
>>>>> +1 212 588-9134 Fax
>>>>>
>>>>>
>>>>>
>>>>
>>>> #####################################################################################
>>>> Important: This electronic message and attachments (if any) are 
>>>> confidential and may be legally privileged. If you are not the 
>>>> intended recipient do not copy, disclose or use the contents in any 
>>>> way. Please let us know by return e-mail immediately and then 
>>>> destroy this message.
>>>> #####################################################################################
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080321/86fdff60/attachment.html>


More information about the argus mailing list