[Argus] Re: Packet Loss with racluster

Carter Bullard carter at qosient.com
Tue Mar 18 09:40:47 EDT 2008


Gentlemen,
Well, racluster() does modify the IP attributes and TCP attributes based
on the records that are being merged together.   Because you are  
modifying
the flow key, and then merging data together, some data maybe ignored.

As an example, If you merge a record that is a singleton with a non- 
singleton,
your resulting merged result may/could retain some singleton  
properties.  A
singleton is a flow with only one packet.  One of the properties of a  
singleton
is that it doesn't have any duration, and it also doesn't have any loss.
Now, if you merge a singleton with a non-singleton you get a non- 
singleton
as the result, so losing things like loss would, of course, be a bug.

The best solution is to see if you can ranonymize() the data, and get  
the
same graph.  You could share that "primitive" data?

Primitive data is the set pf original flow records directly from  
argus().

What do you think?

Carter



On Mar 18, 2008, at 12:02 AM, Stewart Gray wrote:

> That's right, I'll show the example I'm working with:
>
> ra -m proto -s loss -r packet-dump-2008-03-18_08\:28.arg - tcp | awk  
> '{total=total+$1;} END {print total;}'
> 33244
>
> racluster -m proto -s loss -r packet-dump-2008-03-18_08\:28.arg - tcp
> 0
>
> Unfortunately I'm not able to distribute the data I'm working with -  
> it's customers flow logs. I'll see if I can replicate the issues @  
> home so I can provide something to work with.
>
> Cheers,
>
> Stew
> From: Nick Diel [mailto:ndiel at engr.colostate.edu]
> Sent: Tuesday, 18 March 2008 4:53 p.m.
> To: Carter Bullard
> Cc: Stewart Gray; Argus
> Subject: Re: [ARGUS] [Argus] Re: Packet Loss with racluster
>
> Carter,
>
> What you are saying makes sense (I think), but I think there is  
> something else going on here.
>
> Stew had a 2 minute file.  If he used ra to look at just this file  
> he would see individual records that had positive values for loss  
> packet count.  Then he used racluster to merge all status flow  
> records and it reported 0 loss packets.  I think Stew was doing this  
> one file at a time.
>
> Basically if a single file (regardless how it was created) has any  
> status flows with a positive packet loss count, shouldn't racluster  
> be able to report this total for this file?
>
> ra -s loss -r argus.arg - tcp | awk '{total=total+$1;} END {print  
> total;}'  >0
> racluster -m proto -s loss -r argus.arg - tcp  = 0
>
> I may be missing something, but this was how I interpreted Stew's  
> problem.
>
> Nick
>
> Carter Bullard wrote:
>>
>> Hey Guys,
>> There are a lot of things going on that can affect the  
>> "distribution" of numbers
>> on a time series graph, when using flow data.  Flows are not fixed  
>> length samples
>> of network activity, and so you have to do some statistical mods to  
>> make the data
>> generally useful.    Programs like rasplit() and rabins() are  
>> critical to distributing
>> load, rate, packet numbers, loss numbers, jitter, interpackt  
>> arrival times, etc...
>> correctly into timed bins.  Without the use of either rasplit() or  
>> rabins(), which
>> are split/aggregate tools, you can end up with flows that are  
>> longer than the
>> time interval its suppose to represent, which skews the data in  
>> weird ways, and
>> can generate bins with no data in them.
>>
>> Loss doesn't have to be constant, and so the drop outs may actually  
>> be real.
>> And the there are no guarantees that there are actually tcp  
>> connections during
>> those intervals (no TCP, no loss), so we have to look at the data  
>> to see if there
>> is anything wrong.
>>
>> Remember, flows from argus() are as long as the  
>> ARGUS_FAR_STATUS_INTERVAL.
>> A flow that starts at 1:59:59.999999, will be tallied in the  
>> 1:58:00 - 2:00:00 bin, even
>> though its duration could significantly extend well into the  
>> 2:00:00-2:02:00 interval.
>>
>> The trick is to split the data into strict time slots, and then  
>> aggregating those slots.
>> rabins() is very good at this, that is why its at the heart of  
>> ragraph().
>>
>> If I can get some of the data used to generate the graph in the  
>> email, I can
>> see if using rabins() would remove the drop outs.
>>
>> Carter
>>
>>
>>
>> On Mar 17, 2008, at 8:40 PM, Stewart Gray wrote:
>>
>>> I just feed the values into cacti, it's a base metric I can use  
>>> for spotting anomalies. Even if it's not 100% accurate, the  
>>> accuracy should be pretty consistent even if argus inflates/ 
>>> deflates the figure slightly on files which have been sliced up.
>>>
>>> I'm running this argus instance on a busy section of our network  
>>> and there is a constant flow of between 80-140mb/s. I ran the rate/ 
>>> load/loss command and got got:
>>>
>>> 17949.785637 94528448 0
>>>
>>> You can see the blips this morning. The file is actually split  
>>> every 2mins on this particular box.
>>>
>>> <Outlook.jpg>
>>>
>>> It's a bit unusual, if I run 'ra -m proto -s loss -r argus.arg -  
>>> tcp' there are quite a number of losses/retransmits. Might be an  
>>> issue with how racluster is aggregating these?
>>>
>>> Stew
>>>
>>> From: Nick Diel [mailto:ndiel at engr.colostate.edu]
>>> Sent: Tuesday, 18 March 2008 12:10 p.m.
>>> To: Stewart Gray
>>> Cc: Argus
>>> Subject: [Argus] Re: Packet Loss with racluster
>>>
>>> Stew,
>>>
>>> I think the first question is what are you using this number for.   
>>> If you are just using it as an indicator of congestion or other  
>>> network problems then the 5 minute boundary will most likely not  
>>> be a problem.
>>>
>>> I believe Argus just counts the number of retransmitted packets to  
>>> get a loss/drop count, I don't think it is doing any triple  
>>> duplicate ack or tcp timeout checks (if I am wrong, someone please  
>>> say so).  Since retransmissions will occur in a time window of a  
>>> few seconds, you should capture most retransmitted packets in your  
>>> 5 minute boundaries.  So even if a flow cross that boundary, you  
>>> still have a good chance of counting retransmitted packets  
>>> correctly.
>>>
>>> For cases you are receiving a count of 0, I would look at packet  
>>> rate and bit rate, it is possible the link just doesn't have much  
>>> traffic on it at that time. racluster -m proto -s rate load loss - 
>>> r argus.arg - tcp
>>>
>>> Though I did notice something unusual on my end.  The command I  
>>> gave you, should be a strong estimate, but doesn't account for  
>>> retransmitted packets over status flow boundaries within the file  
>>> (though same argument above applies).  So to get an exact count on  
>>> the file (assuming racluster reanalyzes the status flow records  
>>> for retransmissions) you would need something like: racluster -r  
>>> argus.arg -w - - tcp | racluster -m proto -s loss -r - (first  
>>> merge status flow records, then count retransmitted packets).   
>>> Though this is the output I get:
>>>
>>> racluster -m proto -s loss -r argus.out - tcp
>>>      62521
>>> racluster -r argus.out -w - - tcp | racluster -m proto -s loss -r -
>>>      60047
>>>
>>> At a minimum I would expect the numbers to stay the same, no  
>>> retransmitted packets crossed any status flows or racluster  
>>> doesn't try to find any new retransmitted packets.  The number  
>>> going down doesn't make any sense to me.  Maybe someone can  
>>> explain what is going on to me.
>>>
>>> Nick[
>>>
>>>
>>>
>>>
>>> Stewart Gray wrote:
>>>>
>>>> Hey Guys,
>>>>
>>>> How does racluster handle argus files which have been  
>>>> periodically split, when producing packet loss statistics? My  
>>>> monitoring machine rotates the argus file every 5minutes. When  
>>>> using the following command, how skewed are the figures going to  
>>>> be as a result of having an incomplete argus file (ie connections  
>>>> that were current when the log file was rotated).
>>>>
>>>> I'm also note than sometimes the resulting figure is 0. It only  
>>>> seems to do this in about 1/10 argus files I run the command at.
>>>>
>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>> 0
>>>>
>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>> 33036
>>>> Any ideas?
>>>>
>>>> Cheers,
>>>>
>>>> Stew
>>>>
>>>> From: Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>> Sent: Wednesday, 12 March 2008 10:24 a.m.
>>>> To: Stewart Gray
>>>> Cc: Argus
>>>> Subject: Re: [ARGUS] Cheat sheet premiere
>>>>
>>>> How about:
>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>
>>>> This should merge all records based on protocol (in this case  
>>>> only tcp because of the filter) and then print the loss column of  
>>>> all merged records.
>>>>
>>>> Nick
>>>>
>>>> Stewart Gray wrote:
>>>>>
>>>>> awesome, That's a really good start. I've already been playing  
>>>>> with a few of the options I hadn't toyed with before :)
>>>>>
>>>>> Is there an easy way to generate a raw count of packets loss/ 
>>>>> retransmitted rather than having it graphed?
>>>>>
>>>>> I figure we start with:
>>>>>
>>>>> racluster -s loss -r argus.arg -w -
>>>>>
>>>>> How are the figured totaled? Do we pipe it to rasort or ra?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Stewart
>>>>>
>>>>> From: Stéphane Peters [mailto:stephane.peters at forem.be]
>>>>> Sent: Saturday, 8 March 2008 11:06 a.m.
>>>>> To: Carter Bullard
>>>>> Cc: Stewart Gray; Argus
>>>>> Subject: Re: Re: [ARGUS] Cheat sheet premiere
>>>>>
>>>>> Hi Carter,
>>>>>
>>>>> I would love to see such a sheet in the distribution,
>>>>> and I also was hoping that you could check,
>>>>> if those examples made sense or were appropriate.
>>>>> So please go on !
>>>>>
>>>>>
>>>>> Some cosmetic work could be done too;
>>>>> for example to use everywhere some "standard" parameters like  
>>>>> this one :
>>>>>     file=argus-eth1.out
>>>>>     ra -r $file
>>>>> so it is easy to paste the line "as is".
>>>>> without forgetting the shell escapes ( \$srcid) like in:
>>>>>     rasplit -S $argushost  -M 1d -w /path/argus-\$srcid.%Y.%m. 
>>>>> %d.log
>>>>>
>>>>> By the way, as another example given to the list, here are 3  
>>>>> scripts I use.
>>>>> The PATH vars permit to have a nicer ps(1) output.
>>>>>
>>>>> start-argus
>>>>>> #!/bin/sh
>>>>>> interf=eth1
>>>>>> PATH=/sbin ifconfig $interf | grep UP || PATH=/sbin ifconfig  
>>>>>> $interf up
>>>>>> PATH=/usr/local/sbin argus -d -i $interf -e `hostname` -P 561 - 
>>>>>> U128 -mRS 30 -w argus-eth1.out
>>>>>
>>>>> rotate:
>>>>>> #!/bin/sh
>>>>>>
>>>>>> # Rotates server log files, without affecting users who may be
>>>>>> # connected to the server.
>>>>>>
>>>>>> # This can be run as a cron script
>>>>>>
>>>>>> DATE=`date +%Y-%m%d-%H%M`
>>>>>> LOGS='argus-eth1.out'
>>>>>>
>>>>>>  for i in $LOGS; do
>>>>>>    if [ -f $i ]; then
>>>>>>      mv $i $i.$DATE
>>>>>>      gzip -9 $i.$DATE
>>>>>>    fi
>>>>>>  done
>>>>>
>>>>> rotate-daily
>>>>>> #!/bin/sh
>>>>>> ./rotate
>>>>>> sleep 60 # sometimes the preceding command finishes too early
>>>>>> echo ./rotate-daily | at 0000 > /tmp/rotate-daily.log
>>>>>
>>>>> I use at(1) instead of cron(8) to cut the files closer to  
>>>>> midnight.,
>>>>> but rastream(1)'s extended "-w" option seems promising.
>>>>> A better solution could be to use argus(8) to preprocess the  
>>>>> flows,
>>>>> and rastream(1). to write, "rotate" and compress the files.
>>>>> Another thread, perhaps.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Carter Bullard wrote :
>>>>>>
>>>>>> Hey Stephane,
>>>>>> This is great!!!!  I'll put this in the distribution, if you  
>>>>>> don't mind!!!!
>>>>>> And I'll also go through it to make sure that any changes in the
>>>>>> code actually don't break this, and I can add some of the ones
>>>>>> that I do.
>>>>>>
>>>>>> So Russell is asking for a wiki, and we already have one at:
>>>>>>
>>>>>> http://www.vorant.com/nsmwiki/index.php?title=Argus
>>>>>>
>>>>>>
>>>>>> Carter
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mar 7, 2008, at 2:24 PM, Stéphane Peters wrote:
>>>>>>
>>>>>>> Hi Stewart,
>>>>>>>
>>>>>>> I also think that a cheat sheet would be nice !
>>>>>>> Here is a good occasion to show mine...
>>>>>>>
>>>>>>> Please note, most of the stuff has been collected right from  
>>>>>>> this argus list,
>>>>>>> so hopefully, you shouldn't browse all the (numerous) past  
>>>>>>> messages.
>>>>>>>
>>>>>>> Any suggestions ?
>>>>>>>
>>>>>>> flow filtering on certain port range:
>>>>>>>    ra -r file - dst port \( gt 1024 and lt 2048 \)
>>>>>>> (...)
>>>>>>>
>>>>>>>
>>>>>>> Stewart Gray a écrit :
>>>>>>>>
>>>>>>>> awesome, that's more like what I was after :) Thanks for your  
>>>>>>>> help
>>>>>>>> again.
>>>>>>>>
>>>>>>>> As I mentioned earlier, I reckon it'd be neat to have some  
>>>>>>>> sort of cheat
>>>>>>>> sheet for doing common tasks. I bet there's lot's of stuff  
>>>>>>>> you know that
>>>>>>>> others don't, having written the application yourself. I  
>>>>>>>> don't know what
>>>>>>>> I don't know!
>>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> -- 
>>>>>>> Stephane.Peters at forem.be, Postmaster at forem.be
>>>>>>
>>>>>
>>>>> Regards,
>>>>> -- 
>>>>> Stephane.Peters at forem.be
>>>>> #####################################################################################
>>>>> Important: This electronic message and attachments (if any) are  
>>>>> confidential and may be legally privileged. If you are not the  
>>>>> intended recipient do not copy, disclose or use the contents in  
>>>>> any way. Please let us know by return e-mail immediately and  
>>>>> then destroy this message.
>>>>> #####################################################################################
>>>>
>>>> #####################################################################################
>>>> Important: This electronic message and attachments (if any) are  
>>>> confidential and may be legally privileged. If you are not the  
>>>> intended recipient do not copy, disclose or use the contents in  
>>>> any way. Please let us know by return e-mail immediately and then  
>>>> destroy this message.
>>>> #####################################################################################
>>>
>>> #####################################################################################
>>> Important: This electronic message and attachments (if any) are  
>>> confidential and may be legally privileged. If you are not the  
>>> intended recipient do not copy, disclose or use the contents in  
>>> any way. Please let us know by return e-mail immediately and then  
>>> destroy this message.
>>> #####################################################################################
>>
>> Carter Bullard
>> CEO/President
>> QoSient, LLC
>> 150 E. 57th Street Suite 12D
>> New York, New York 10022
>>
>> +1 212 588-9133 Phone
>> +1 212 588-9134 Fax
>>
>>
>>
>
> #####################################################################################
> Important: This electronic message and attachments (if any) are  
> confidential and may be legally privileged. If you are not the  
> intended recipient do not copy, disclose or use the contents in any  
> way. Please let us know by return e-mail immediately and then  
> destroy this message.
> #####################################################################################

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080318/fd1ad3e1/attachment.html>


More information about the argus mailing list