Argus/racluster fundamentals
Nick Diel
ndiel at engr.colostate.edu
Fri Mar 21 13:33:16 EDT 2008
Carter,
Great info. It appears filters are quite critical when merging on
things other than flow key. Of course now I have more questions.
Primarily how does Argus choose which Tcp DSR record to use. I know you
can specify options with Argus to tell it to generate additional
records, and I am sure Argus uses different DSRs for singleton and
non-singleton records. I am just curious what some of the exact methods
Argus uses.
I am more of thinking out loud here, but at least for TCP why does Argus
look for packet loss by watching sequence numbers? Shouldn't the only
time this method shows a loss (over multiple status flows) is when the
connection is going to close abnormally very soon (TCP out of sync, one
end down)? I guess retransmitted packets seems like it should be the
only indication of loss? I am sure you have thought about this a whole
lot; I am just having a hard time understanding where this method would
be beneficial. Maybe this method accounts for retransmitted packets too?
Nick
Carter Bullard wrote:
> Hey Nick,
> When you merge two records together, the aggregation engine goes through
> each DSR (data specific record) in the two argus records, and compares
> them
> for applicability/consistency etc... If the 2 corresponding DSRs are
> incompatible,
> the aggregation engine will simply throw that DSR away.
>
> All the TCP information, base sequence numbers, acks, roundtrip times,
> window sizes, retransmissions, etc .... are all contained in the
> ARGUS_NETWORK_DSR, which holds protocol specific information.
> If you merge an ICMP flow with a TCP flow, the aggregator just tosses
> the ARGUS_NETWORK_DSR away, because the DSR has different
> meanings and are not compatible, and you lose the TCP specific
> information.
> This can happen when the flow key is just "-m saddr daddr", and so
> flows between A and B, regardless of protocol, get merged together.
>
> If one argus record has, say, ethernet addresses in the ARGUS_MAC_DSR,
> but the other record to be merged doesn't have an ARGUS_MAC_DSR, for
> whatever reason, we'll toss the ARGUS_MAC_DSR when generating the
> resultant merged record. Now, there are conditions where we are
> "preserving"
> and we keep the DSR, rather than throw it away. This happens with
> ARGUS_AGR_DSR's. Each DSR has its own set of rules for what to do.
>
> However, when the DSR's are compatible, but different, you can get
> some interesting results. There are 3 types of ARGUS_NETWORK_DSRs, for
> TCP data: ArgusTcpInit, ArgusTcpStatus, and ArgusTcpPerf, and only one
> has loss statistics. If you merge an ArgusTcpInit DSR (which only has
> base sequence
> numbers, flags and roundtrip times) with a ArgusTcpPerf DSR (which has
> everything), you are suppose to get an ArgusTcpPerf DSR, with some slight
> mods to the fields. (state values get or'd, flags get or'd, base
> sequence numbers are
> checked to make sure they are the same, and if not the result is adjusted,
> total bytes transmitted are summed, etc....) The source code is
> pretty dense
> in this area, so there is a lot to talk about.
>
> With loss, there is such a thing as negative loss. We see this with
> protocols like ESP and RTP quite often, when packets get out of order.
> Argus see's sequence number 23, then 25, and we need to report the
> flow, and so we report a loss of 1 packet. Well, the next packet
> that argus see's after sending the status report, is packet number 24 and
> then 26 and then 27. Well, we need to report that 24 showed up, and so
> when we generate the next flow status record, we report a loss of -1.
> Later,
> when you merge the two status flow records together, the loss becomes
> zero.
>
> You won't see that too often with TCP, but you can get that kind of
> behavior, especially when the Far Status Interval is below 1 second.
>
> I'm thinking that this situation is caused by a bug, where we merge an
> ArgusTcpInit and an ArgusTcpPerf DSR together, and fail to redefine
> the DSR to ArgusTcpPerf, but leave it as ArgusTcpInit, which of course
> doesn't/can't have any retransmission stats. The newest client code
> that is
> on the server (refreshed yesterday) does have some addition logic
> to make it less likely to have this problem, but I have to double/triple
> check to see what is actually going on. Having data that generates the
> problem, makes that much easier.
>
> This is a long topic, so keep sending questions, and we'll get a something
> written down that may make some sense.
>
> Carter
>
>
> On Mar 18, 2008, at 12:30 PM, Nick Diel wrote:
>
>> Carter,
>>
>> First thanks for everything you have done. Second thanks for all
>> this great info, it as been extremely helpful as I learn Argus. We
>> will need a wiki page just for all in the info you have given so far.
>>
>> Hopefully Stew can anonymize the data, so you can shed some light on
>> what is going on.
>>
>> Can you tell me/the rest of the list a little bit more how racluster
>> handles Ip attributes and TCP attributes. For instance, if racluster
>> is merging based on flow keys, will it attempt to find additional
>> retransmitted packets. For example if a singleton is actually a
>> retransmitted packet for another non-singleton, would racluster
>> detect that and increase the loss count after they are merged together?
>>
>> Nick
>>
>> Carter Bullard wrote:
>>> Gentlemen,
>>> Well, racluster() does modify the IP attributes and TCP attributes based
>>> on the records that are being merged together. Because you are
>>> modifying
>>> the flow key, and then merging data together, some data maybe ignored.
>>>
>>> As an example, If you merge a record that is a singleton with a
>>> non-singleton,
>>> your resulting merged result may/could retain some singleton
>>> properties. A
>>> singleton is a flow with only one packet. One of the properties of
>>> a singleton
>>> is that it doesn't have any duration, and it also doesn't have any loss.
>>> Now, if you merge a singleton with a non-singleton you get a
>>> non-singleton
>>> as the result, so losing things like loss would, of course, be a bug.
>>>
>>> The best solution is to see if you can ranonymize() the data, and
>>> get the
>>> same graph. You could share that "primitive" data?
>>>
>>> Primitive data is the set pf original flow records directly from
>>> argus().
>>>
>>> What do you think?
>>>
>>> Carter
>>>
>>>
>>>
>>> On Mar 18, 2008, at 12:02 AM, Stewart Gray wrote:
>>>
>>>> That's right, I'll show the example I'm working with:
>>>>
>>>> ra -m proto -s loss -r packet-dump-2008-03-18_08\:28.arg - tcp |
>>>> awk '{total=total+$1;} END {print total;}'
>>>> 33244
>>>>
>>>> racluster -m proto -s loss -r packet-dump-2008-03-18_08\:28.arg - tcp
>>>> 0
>>>>
>>>> Unfortunately I'm not able to distribute the data I'm working with
>>>> - it's customers flow logs. I'll see if I can replicate the issues
>>>> @ home so I can provide something to work with.
>>>>
>>>> Cheers,
>>>>
>>>> Stew
>>>> ------------------------------------------------------------------------
>>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>> *Sent:* Tuesday, 18 March 2008 4:53 p.m.
>>>> *To:* Carter Bullard
>>>> *Cc:* Stewart Gray; Argus
>>>> *Subject:* Re: [ARGUS] [Argus] Re: Packet Loss with racluster
>>>>
>>>> Carter,
>>>>
>>>> What you are saying makes sense (I think), but I think there is
>>>> something else going on here.
>>>>
>>>> Stew had a 2 minute file. If he used ra to look at just this file
>>>> he would see individual records that had positive values for loss
>>>> packet count. Then he used racluster to merge all status flow
>>>> records and it reported 0 loss packets. I think Stew was doing
>>>> this one file at a time.
>>>>
>>>> Basically if a single file (regardless how it was created) has any
>>>> status flows with a positive packet loss count, shouldn't racluster
>>>> be able to report this total for this file?
>>>>
>>>> ra -s loss -r argus.arg - tcp | awk '{total=total+$1;} END {print
>>>> total;}' >0
>>>> racluster -m proto -s loss -r argus.arg - tcp = 0
>>>>
>>>> I may be missing something, but this was how I interpreted Stew's
>>>> problem.
>>>>
>>>> Nick
>>>>
>>>> Carter Bullard wrote:
>>>>> Hey Guys,
>>>>> There are a lot of things going on that can affect the
>>>>> "distribution" of numbers
>>>>> on a time series graph, when using flow data. Flows are not fixed
>>>>> length samples
>>>>> of network activity, and so you have to do some statistical mods
>>>>> to make the data
>>>>> generally useful. Programs like rasplit() and rabins() are
>>>>> critical to distributing
>>>>> load, rate, packet numbers, loss numbers, jitter, interpackt
>>>>> arrival times, etc...
>>>>> correctly into timed bins. Without the use of either rasplit() or
>>>>> rabins(), which
>>>>> are split/aggregate tools, you can end up with flows that are
>>>>> longer than the
>>>>> time interval its suppose to represent, which skews the data in
>>>>> weird ways, and
>>>>> can generate bins with no data in them.
>>>>>
>>>>> Loss doesn't have to be constant, and so the drop outs may
>>>>> actually be real.
>>>>> And the there are no guarantees that there are actually tcp
>>>>> connections during
>>>>> those intervals (no TCP, no loss), so we have to look at the data
>>>>> to see if there
>>>>> is anything wrong.
>>>>>
>>>>> Remember, flows from argus() are as long as the
>>>>> ARGUS_FAR_STATUS_INTERVAL.
>>>>> A flow that starts at 1:59:59.999999, will be tallied in the
>>>>> 1:58:00 - 2:00:00 bin, even
>>>>> though its duration could significantly extend well into the
>>>>> 2:00:00-2:02:00 interval.
>>>>>
>>>>> The trick is to split the data into strict time slots, and then
>>>>> aggregating those slots.
>>>>> rabins() is very good at this, that is why its at the heart of
>>>>> ragraph().
>>>>>
>>>>> If I can get some of the data used to generate the graph in the
>>>>> email, I can
>>>>> see if using rabins() would remove the drop outs.
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>>
>>>>> On Mar 17, 2008, at 8:40 PM, Stewart Gray wrote:
>>>>>
>>>>>> I just feed the values into cacti, it's a base metric I can use
>>>>>> for spotting anomalies. Even if it's not 100% accurate, the
>>>>>> accuracy should be pretty consistent even if argus
>>>>>> inflates/deflates the figure slightly on files which have been
>>>>>> sliced up.
>>>>>>
>>>>>> I'm running this argus instance on a busy section of our network
>>>>>> and there is a constant flow of between 80-140mb/s. I ran the
>>>>>> rate/load/loss command and got got:
>>>>>>
>>>>>> 17949.785637 94528448 0
>>>>>>
>>>>>> You can see the blips this morning. The file is actually split
>>>>>> every 2mins on this particular box.
>>>>>>
>>>>>> <Outlook.jpg>
>>>>>>
>>>>>> It's a bit unusual, if I run 'ra -m proto -s loss -r argus.arg -
>>>>>> tcp' there are quite a number of losses/retransmits. Might be an
>>>>>> issue with how racluster is aggregating these?
>>>>>>
>>>>>> Stew
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>>>> *Sent:* Tuesday, 18 March 2008 12:10 p.m.
>>>>>> *To:* Stewart Gray
>>>>>> *Cc:* Argus
>>>>>> *Subject:* [Argus] Re: Packet Loss with racluster
>>>>>>
>>>>>> Stew,
>>>>>>
>>>>>> I think the first question is what are you using this number
>>>>>> for. If you are just using it as an indicator of congestion or
>>>>>> other network problems then the 5 minute boundary will most
>>>>>> likely not be a problem.
>>>>>>
>>>>>> I believe Argus just counts the number of retransmitted packets
>>>>>> to get a loss/drop count, I don't think it is doing any triple
>>>>>> duplicate ack or tcp timeout checks (if I am wrong, someone
>>>>>> please say so). Since retransmissions will occur in a time
>>>>>> window of a few seconds, you should capture most retransmitted
>>>>>> packets in your 5 minute boundaries. So even if a flow cross
>>>>>> that boundary, you still have a good chance of counting
>>>>>> retransmitted packets correctly.
>>>>>>
>>>>>> For cases you are receiving a count of 0, I would look at packet
>>>>>> rate and bit rate, it is possible the link just doesn't have much
>>>>>> traffic on it at that time. racluster -m proto -s rate load loss
>>>>>> -r argus.arg - tcp
>>>>>>
>>>>>> Though I did notice something unusual on my end. The command I
>>>>>> gave you, should be a strong estimate, but doesn't account for
>>>>>> retransmitted packets over status flow boundaries within the file
>>>>>> (though same argument above applies). So to get an exact count
>>>>>> on the file (assuming racluster reanalyzes the status flow
>>>>>> records for retransmissions) you would need something like:
>>>>>> racluster -r argus.arg -w - - tcp | racluster -m proto -s loss -r
>>>>>> - (first merge status flow records, then count retransmitted
>>>>>> packets). Though this is the output I get:
>>>>>>
>>>>>> racluster -m proto -s loss -r argus.out - tcp
>>>>>> 62521
>>>>>> racluster -r argus.out -w - - tcp | racluster -m proto -s loss -r -
>>>>>> 60047
>>>>>>
>>>>>> At a minimum I would expect the numbers to stay the same, no
>>>>>> retransmitted packets crossed any status flows or racluster
>>>>>> doesn't try to find any new retransmitted packets. The number
>>>>>> going down doesn't make any sense to me. Maybe someone can
>>>>>> explain what is going on to me.
>>>>>>
>>>>>> Nick[
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Stewart Gray wrote:
>>>>>>> Hey Guys,
>>>>>>>
>>>>>>> How does racluster handle argus files which have been
>>>>>>> periodically split, when producing packet loss statistics? My
>>>>>>> monitoring machine rotates the argus file every 5minutes. When
>>>>>>> using the following command, how skewed are the figures going to
>>>>>>> be as a result of having an incomplete argus file (ie
>>>>>>> connections that were current when the log file was rotated).
>>>>>>>
>>>>>>> I'm also note than sometimes the resulting figure is 0. It only
>>>>>>> seems to do this in about 1/10 argus files I run the command at.
>>>>>>>
>>>>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>>>> 0
>>>>>>>
>>>>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>>>> 33036
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Stew
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> *From:* Nick Diel [mailto:ndiel at engr.colostate.edu]
>>>>>>> *Sent:* Wednesday, 12 March 2008 10:24 a.m.
>>>>>>> *To:* Stewart Gray
>>>>>>> *Cc:* Argus
>>>>>>> *Subject:* Re: [ARGUS] Cheat sheet premiere
>>>>>>>
>>>>>>> How about:
>>>>>>> racluster -m proto -s loss -r argus.arg - tcp
>>>>>>>
>>>>>>> This should merge all records based on protocol (in this case
>>>>>>> only tcp because of the filter) and then print the loss column
>>>>>>> of all merged records.
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> Stewart Gray wrote:
>>>>>>>> awesome, That's a really good start. I've already been playing
>>>>>>>> with a few of the options I hadn't toyed with before :)
>>>>>>>>
>>>>>>>> Is there an easy way to generate a raw count of packets
>>>>>>>> loss/retransmitted rather than having it graphed?
>>>>>>>>
>>>>>>>> I figure we start with:
>>>>>>>>
>>>>>>>> racluster -s loss -r argus.arg -w -
>>>>>>>>
>>>>>>>> How are the figured totaled? Do we pipe it to rasort or ra?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Stewart
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> *From:* Stéphane Peters [mailto:stephane.peters at forem.be]
>>>>>>>> *Sent:* Saturday, 8 March 2008 11:06 a.m.
>>>>>>>> *To:* Carter Bullard
>>>>>>>> *Cc:* Stewart Gray; Argus
>>>>>>>> *Subject:* Re: Re: [ARGUS] Cheat sheet premiere
>>>>>>>>
>>>>>>>> Hi Carter,
>>>>>>>>
>>>>>>>> I would love to see such a sheet in the distribution,
>>>>>>>> and I also was hoping that you could check,
>>>>>>>> if those examples made sense or were appropriate.
>>>>>>>> So please go on !
>>>>>>>>
>>>>>>>>
>>>>>>>> Some cosmetic work could be done too;
>>>>>>>> for example to use everywhere some "standard" parameters like
>>>>>>>> this one :
>>>>>>>> file=argus-eth1.out
>>>>>>>> ra -r $file
>>>>>>>> so it is easy to paste the line "as is".
>>>>>>>> without forgetting the shell escapes ( \$srcid) like in:
>>>>>>>> rasplit -S $argushost -M 1d -w
>>>>>>>> /path/argus-\$srcid.%Y.%m.%d.log
>>>>>>>>
>>>>>>>> By the way, as another example given to the list, here are 3
>>>>>>>> scripts I use.
>>>>>>>> The PATH vars permit to have a nicer ps(1) output.
>>>>>>>>
>>>>>>>> start-argus
>>>>>>>>> #!/bin/sh
>>>>>>>>> interf=eth1
>>>>>>>>> PATH=/sbin ifconfig $interf | grep UP || PATH=/sbin ifconfig
>>>>>>>>> $interf up
>>>>>>>>> PATH=/usr/local/sbin argus -d -i $interf -e `hostname` -P 561
>>>>>>>>> -U128 -mRS 30 -w argus-eth1.out
>>>>>>>>
>>>>>>>> rotate:
>>>>>>>>> #!/bin/sh
>>>>>>>>>
>>>>>>>>> # Rotates server log files, without affecting users who may be
>>>>>>>>> # connected to the server.
>>>>>>>>>
>>>>>>>>> # This can be run as a cron script
>>>>>>>>>
>>>>>>>>> DATE=`date +%Y-%m%d-%H%M`
>>>>>>>>> LOGS='argus-eth1.out'
>>>>>>>>>
>>>>>>>>> for i in $LOGS; do
>>>>>>>>> if [ -f $i ]; then
>>>>>>>>> mv $i $i.$DATE
>>>>>>>>> gzip -9 $i.$DATE
>>>>>>>>> fi
>>>>>>>>> done
>>>>>>>>
>>>>>>>> rotate-daily
>>>>>>>>> #!/bin/sh
>>>>>>>>> ./rotate
>>>>>>>>> sleep 60 # sometimes the preceding command finishes too early
>>>>>>>>> echo ./rotate-daily | at 0000 > /tmp/rotate-daily.log
>>>>>>>>
>>>>>>>> I use at(1) instead of cron(8) to cut the files closer to
>>>>>>>> midnight.,
>>>>>>>> but rastream(1)'s extended "-w" option seems promising.
>>>>>>>> A better solution could be to use argus(8) to preprocess the
>>>>>>>> flows,
>>>>>>>> and rastream(1). to write, "rotate" and compress the files.
>>>>>>>> Another thread, perhaps.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Carter Bullard wrote :
>>>>>>>>> Hey Stephane,
>>>>>>>>> This is great!!!! I'll put this in the distribution, if you
>>>>>>>>> don't mind!!!!
>>>>>>>>> And I'll also go through it to make sure that any changes in the
>>>>>>>>> code actually don't break this, and I can add some of the ones
>>>>>>>>> that I do.
>>>>>>>>>
>>>>>>>>> So Russell is asking for a wiki, and we already have one at:
>>>>>>>>>
>>>>>>>>> http://www.vorant.com/nsmwiki/index.php?title=Argus
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Carter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 7, 2008, at 2:24 PM, Stéphane Peters wrote:
>>>>>>>>>
>>>>>>>>>> Hi Stewart,
>>>>>>>>>>
>>>>>>>>>> I also think that a cheat sheet would be nice !
>>>>>>>>>> Here is a good occasion to show mine...
>>>>>>>>>>
>>>>>>>>>> Please note, most of the stuff has been collected right from
>>>>>>>>>> this argus list,
>>>>>>>>>> so hopefully, you shouldn't browse all the (numerous) past
>>>>>>>>>> messages.
>>>>>>>>>>
>>>>>>>>>> Any suggestions ?
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>> flow filtering on certain port range:
>>>>>>>>>> ra -r file - dst port \( gt 1024 and lt 2048 \)
>>>>>>>>>> (...)
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stewart Gray a écrit :
>>>>>>>>>>> awesome, that's more like what I was after :) Thanks for your help
>>>>>>>>>>> again.
>>>>>>>>>>>
>>>>>>>>>>> As I mentioned earlier, I reckon it'd be neat to have some sort of cheat
>>>>>>>>>>> sheet for doing common tasks. I bet there's lot's of stuff you know that
>>>>>>>>>>> others don't, having written the application yourself. I don't know what
>>>>>>>>>>> I don't know!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> --
>>>>>>>>>> Stephane.Peters at forem.be, Postmaster at forem.be
>>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Stephane.Peters at forem.be
>>>>>>>> #####################################################################################
>>>>>>>> Important: This electronic message and attachments (if any) are
>>>>>>>> confidential and may be legally privileged. If you are not the
>>>>>>>> intended recipient do not copy, disclose or use the contents in
>>>>>>>> any way. Please let us know by return e-mail immediately and
>>>>>>>> then destroy this message.
>>>>>>>> #####################################################################################
>>>>>>>
>>>>>>> #####################################################################################
>>>>>>> Important: This electronic message and attachments (if any) are
>>>>>>> confidential and may be legally privileged. If you are not the
>>>>>>> intended recipient do not copy, disclose or use the contents in
>>>>>>> any way. Please let us know by return e-mail immediately and
>>>>>>> then destroy this message.
>>>>>>> #####################################################################################
>>>>>>
>>>>>> #####################################################################################
>>>>>> Important: This electronic message and attachments (if any) are
>>>>>> confidential and may be legally privileged. If you are not the
>>>>>> intended recipient do not copy, disclose or use the contents in
>>>>>> any way. Please let us know by return e-mail immediately and then
>>>>>> destroy this message.
>>>>>> #####################################################################################
>>>>>
>>>>> Carter Bullard
>>>>> CEO/President
>>>>> QoSient, LLC
>>>>> 150 E. 57th Street Suite 12D
>>>>> New York, New York 10022
>>>>>
>>>>> +1 212 588-9133 Phone
>>>>> +1 212 588-9134 Fax
>>>>>
>>>>>
>>>>>
>>>>
>>>> #####################################################################################
>>>> Important: This electronic message and attachments (if any) are
>>>> confidential and may be legally privileged. If you are not the
>>>> intended recipient do not copy, disclose or use the contents in any
>>>> way. Please let us know by return e-mail immediately and then
>>>> destroy this message.
>>>> #####################################################################################
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080321/86fdff60/attachment.html>
More information about the argus
mailing list