[Argus] Re: Packet Loss with racluster

Nick Diel ndiel at engr.colostate.edu
Mon Mar 17 23:52:33 EDT 2008


Carter,

What you are saying makes sense (I think), but I think there is 
something else going on here.

Stew had a 2 minute file.  If he used ra to look at just this file he 
would see individual records that had positive values for loss packet 
count.  Then he used racluster to merge all status flow records and it 
reported 0 loss packets.  I think Stew was doing this one file at a time.

Basically if a single file (regardless how it was created) has any 
status flows with a positive packet loss count, shouldn't racluster be 
able to report this total for this file?

ra -s loss -r argus.arg - tcp | awk '{total=total+$1;} END {print 
total;}'  >0
racluster -m proto -s loss -r argus.arg - tcp  = 0

I may be missing something, but this was how I interpreted Stew's problem.

Nick

Carter Bullard wrote:
> Hey Guys,
> There are a lot of things going on that can affect the "distribution" 
> of numbers
> on a time series graph, when using flow data.  Flows are not fixed 
> length samples
> of network activity, and so you have to do some statistical mods to 
> make the data
> generally useful.    Programs like rasplit() and rabins() are critical 
> to distributing
> load, rate, packet numbers, loss numbers, jitter, interpackt arrival 
> times, etc...
> correctly into timed bins.  Without the use of either rasplit() or 
> rabins(), which
> are split/aggregate tools, you can end up with flows that are longer 
> than the
> time interval its suppose to represent, which skews the data in weird 
> ways, and
> can generate bins with no data in them.
>
> Loss doesn't have to be constant, and so the drop outs may actually be 
> real.
> And the there are no guarantees that there are actually tcp 
> connections during
> those intervals (no TCP, no loss), so we have to look at the data to 
> see if there
> is anything wrong.
>
> Remember, flows from argus() are as long as the ARGUS_FAR_STATUS_INTERVAL.
> A flow that starts at 1:59:59.999999, will be tallied in the 1:58:00 - 
> 2:00:00 bin, even
> though its duration could significantly extend well into the 
> 2:00:00-2:02:00 interval.
>
> The trick is to split the data into strict time slots, and then 
> aggregating those slots.
> rabins() is very good at this, that is why its at the heart of ragraph().
>
> If I can get some of the data used to generate the graph in the email, 
> I can
> see if using rabins() would remove the drop outs.
>
> Carter
>
>
>
> On Mar 17, 2008, at 8:40 PM, Stewart Gray wrote:
>
>> I just feed the values into cacti, it's a base metric I can use for 
>> spotting anomalies. Even if it's not 100% accurate, the accuracy 
>> should be pretty consistent even if argus inflates/deflates the 
>> figure slightly on files which have been sliced up.
>>  
>> I'm running this argus instance on a busy section of our network and 
>> there is a constant flow of between 80-140mb/s. I ran the 
>> rate/load/loss command and got got:
>>  
>> 17949.785637 94528448 0
>>  
>> You can see the blips this morning. The file is actually split every 
>> 2mins on this particular box.
>>  
>> <Outlook.jpg>
>>  
>> It's a bit unusual, if I run 'ra -m proto -s loss -r argus.arg - tcp' 
>> there are quite a number of losses/retransmits. Might be an issue 
>> with how racluster is aggregating these?
>>  
>> Stew
>>
>> ------------------------------------------------------------------------
>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>> *Sent:* Tuesday, 18 March 2008 12:10 p.m.
>> *To:* Stewart Gray
>> *Cc:* Argus
>> *Subject:* [Argus] Re: Packet Loss with racluster
>>
>> Stew,
>>
>> I think the first question is what are you using this number for.  If 
>> you are just using it as an indicator of congestion or other network 
>> problems then the 5 minute boundary will most likely not be a problem.
>>
>> I believe Argus just counts the number of retransmitted packets to 
>> get a loss/drop count, I don't think it is doing any triple duplicate 
>> ack or tcp timeout checks (if I am wrong, someone please say so).  
>> Since retransmissions will occur in a time window of a few seconds, 
>> you should capture most retransmitted packets in your 5 minute 
>> boundaries.  So even if a flow cross that boundary, you still have a 
>> good chance of counting retransmitted packets correctly.
>>
>> For cases you are receiving a count of 0, I would look at packet rate 
>> and bit rate, it is possible the link just doesn't have much traffic 
>> on it at that time. racluster -m proto -s rate load loss -r argus.arg 
>> - tcp
>>
>> Though I did notice something unusual on my end.  The command I gave 
>> you, should be a strong estimate, but doesn't account for 
>> retransmitted packets over status flow boundaries within the file 
>> (though same argument above applies).  So to get an exact count on 
>> the file (assuming racluster reanalyzes the status flow records for 
>> retransmissions) you would need something like: racluster -r 
>> argus.arg -w - - tcp | racluster -m proto -s loss -r - (first merge 
>> status flow records, then count retransmitted packets).  Though this 
>> is the output I get:
>>
>> racluster -m proto -s loss -r argus.out - tcp
>>      62521
>> racluster -r argus.out -w - - tcp | racluster -m proto -s loss -r -
>>      60047
>>
>> At a minimum I would expect the numbers to stay the same, no 
>> retransmitted packets crossed any status flows or racluster doesn't 
>> try to find any new retransmitted packets.  The number going down 
>> doesn't make any sense to me.  Maybe someone can explain what is 
>> going on to me.
>>
>> Nick[
>>
>>
>>
>>
>> Stewart Gray wrote:
>>> Hey Guys,
>>>  
>>> How does racluster handle argus files which have been periodically 
>>> split, when producing packet loss statistics? My monitoring machine 
>>> rotates the argus file every 5minutes. When using the following 
>>> command, how skewed are the figures going to be as a result of 
>>> having an incomplete argus file (ie connections that were current 
>>> when the log file was rotated).
>>>  
>>> I'm also note than sometimes the resulting figure is 0. It only 
>>> seems to do this in about 1/10 argus files I run the command at.
>>>  
>>> racluster -m proto -s loss -r argus.arg - tcp
>>> 0
>>>  
>>> racluster -m proto -s loss -r argus.arg - tcp
>>> 33036
>>> Any ideas?
>>>  
>>> Cheers,
>>>  
>>> Stew
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>> *Sent:* Wednesday, 12 March 2008 10:24 a.m.
>>> *To:* Stewart Gray
>>> *Cc:* Argus
>>> *Subject:* Re: [ARGUS] Cheat sheet premiere
>>>
>>> How about:
>>> racluster -m proto -s loss -r argus.arg - tcp
>>>
>>> This should merge all records based on protocol (in this case only 
>>> tcp because of the filter) and then print the loss column of all 
>>> merged records.
>>>
>>> Nick
>>>
>>> Stewart Gray wrote:
>>>> awesome, That's a really good start. I've already been playing with 
>>>> a few of the options I hadn't toyed with before :)
>>>>  
>>>> Is there an easy way to generate a raw count of packets 
>>>> loss/retransmitted rather than having it graphed?
>>>>  
>>>> I figure we start with:
>>>>  
>>>> racluster -s loss -r argus.arg -w -
>>>>  
>>>> How are the figured totaled? Do we pipe it to rasort or ra?
>>>>  
>>>> Thanks,
>>>>  
>>>> Stewart
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* Stéphane Peters [mailto:stephane.peters at forem.be]
>>>> *Sent:* Saturday, 8 March 2008 11:06 a.m.
>>>> *To:* Carter Bullard
>>>> *Cc:* Stewart Gray; Argus
>>>> *Subject:* Re: Re: [ARGUS] Cheat sheet premiere
>>>>
>>>> Hi Carter,
>>>>
>>>> I would love to see such a sheet in the distribution,
>>>> and I also was hoping that you could check,
>>>> if those examples made sense or were appropriate.
>>>> So please go on !
>>>>
>>>>
>>>> Some cosmetic work could be done too;
>>>> for example to use everywhere some "standard" parameters like this 
>>>> one :
>>>>     file=argus-eth1.out
>>>>     ra -r $file
>>>> so it is easy to paste the line "as is".
>>>> without forgetting the shell escapes ( \$srcid) like in:
>>>>     rasplit -S $argushost  -M 1d -w /path/argus-\$srcid.%Y.%m.%d.log
>>>>
>>>> By the way, as another example given to the list, here are 3 
>>>> scripts I use.
>>>> The PATH vars permit to have a nicer ps(1) output.
>>>>
>>>> start-argus
>>>>> #!/bin/sh
>>>>> interf=eth1
>>>>> PATH=/sbin ifconfig $interf | grep UP || PATH=/sbin ifconfig 
>>>>> $interf up
>>>>> PATH=/usr/local/sbin argus -d -i $interf -e `hostname` -P 561 
>>>>> -U128 -mRS 30 -w argus-eth1.out
>>>>
>>>> rotate:
>>>>> #!/bin/sh
>>>>>
>>>>> # Rotates server log files, without affecting users who may be
>>>>> # connected to the server.
>>>>>
>>>>> # This can be run as a cron script
>>>>>
>>>>> DATE=`date +%Y-%m%d-%H%M`
>>>>> LOGS='argus-eth1.out'
>>>>>
>>>>>  for i in $LOGS; do
>>>>>    if [ -f $i ]; then
>>>>>      mv $i $i.$DATE
>>>>>      gzip -9 $i.$DATE
>>>>>    fi
>>>>>  done
>>>>
>>>> rotate-daily
>>>>> #!/bin/sh
>>>>> ./rotate
>>>>> sleep 60 # sometimes the preceding command finishes too early
>>>>> echo ./rotate-daily | at 0000 > /tmp/rotate-daily.log
>>>>
>>>> I use at(1) instead of cron(8) to cut the files closer to midnight.,
>>>> but rastream(1)'s extended "-w" option seems promising.
>>>> A better solution could be to use argus(8) to preprocess the flows,
>>>> and rastream(1). to write, "rotate" and compress the files.
>>>> Another thread, perhaps.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>    
>>>>
>>>>
>>>> Carter Bullard wrote :
>>>>> Hey Stephane,
>>>>> This is great!!!!  I'll put this in the distribution, if you don't 
>>>>> mind!!!!
>>>>> And I'll also go through it to make sure that any changes in the
>>>>> code actually don't break this, and I can add some of the ones
>>>>> that I do.
>>>>>
>>>>> So Russell is asking for a wiki, and we already have one at:
>>>>>
>>>>> http://www.vorant.com/nsmwiki/index.php?title=Argus
>>>>>
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mar 7, 2008, at 2:24 PM, Stéphane Peters wrote:
>>>>>
>>>>>> Hi Stewart,
>>>>>>
>>>>>> I also think that a cheat sheet would be nice !
>>>>>> Here is a good occasion to show mine...
>>>>>>
>>>>>> Please note, most of the stuff has been collected right from this 
>>>>>> argus list,
>>>>>> so hopefully, you shouldn't browse all the (numerous) past messages.
>>>>>>
>>>>>> Any suggestions ?
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> flow filtering on certain port range:
>>>>>>    ra -r file - dst port \( gt 1024 and lt 2048 \)
>>>>>> (...)
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Stewart Gray a écrit :
>>>>>>> awesome, that's more like what I was after :) Thanks for your help
>>>>>>> again. 
>>>>>>>
>>>>>>> As I mentioned earlier, I reckon it'd be neat to have some sort of cheat
>>>>>>> sheet for doing common tasks. I bet there's lot's of stuff you know that
>>>>>>> others don't, having written the application yourself. I don't know what
>>>>>>> I don't know!
>>>>>>>   
>>>>>>
>>>>>> Regards,
>>>>>> -- 
>>>>>> Stephane.Peters at forem.be, Postmaster at forem.be
>>>>>
>>>>
>>>> Regards,
>>>> -- 
>>>> Stephane.Peters at forem.be
>>>> #####################################################################################
>>>> Important: This electronic message and attachments (if any) are 
>>>> confidential and may be legally privileged. If you are not the 
>>>> intended recipient do not copy, disclose or use the contents in any 
>>>> way. Please let us know by return e-mail immediately and then 
>>>> destroy this message.
>>>> #####################################################################################
>>>
>>> #####################################################################################
>>> Important: This electronic message and attachments (if any) are 
>>> confidential and may be legally privileged. If you are not the 
>>> intended recipient do not copy, disclose or use the contents in any 
>>> way. Please let us know by return e-mail immediately and then 
>>> destroy this message.
>>> #####################################################################################
>>
>> #####################################################################################
>> Important: This electronic message and attachments (if any) are 
>> confidential and may be legally privileged. If you are not the 
>> intended recipient do not copy, disclose or use the contents in any 
>> way. Please let us know by return e-mail immediately and then destroy 
>> this message.
>> #####################################################################################
>
> Carter Bullard
> CEO/President
> QoSient, LLC
> 150 E. 57th Street Suite 12D
> New York, New York 10022
>
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080317/e2f54a08/attachment.html>


More information about the argus mailing list