Argus tweaking and design considerations

Carter Bullard carter at qosient.com
Thu Feb 22 09:51:06 EST 2001


Hey Scott (et al),
   I'm forwarding this to the mailing list, as guys like Peter
will want to see it.  The Linux performance you see is also my
experience, that it does well with packet throughput.

   The tweaks we are making to ArgusModeler and ArgusUtil are
important, so in the FreeBSD situation we need to find a good
set of numbers.  I'll elaborate on the concepts here so that
we can all contribute.

   Argus has a single circular flow queue.  The queue is
sorted in arrival time order (we add to the end of the queue).
As packets come in, we find the flow using a decent hash, if
it doesn't exist we add a new flow to the end of the queue,
then we check to see if we need to write its contents out
if its status timers say to, and then we update its values,
leaving it in the queue.

   When flows are active, they tend to manage themselves.
When they are done, or are relatively idle, the flows are
visited by a queue processor that runs periodically, looking
at everything in the queue to see if it needs to be written
out, or if it needs to be deleted.

   Currently in beta.6, the queue processor runs once a second.
We get a packet or timeout from the packet queue and we discover
that its time to go "groom" the queue.  Because we have to get
back to the packet queue pretty quick so that we don't drop
packets, we can only process so many flows at a time.  Currently,
beta.6 will process the whole queue, up to 2048, and then it will
process some fraction of the queue.  These values are probably
causing your packet loss problems.

   The tweaks I am trying cause us to process the queue more
often.  I'm shooting for maybe as high as 8 times a second, and
at each turn, we should process some flows.  We can process 2048
flows a turn without much load on the machine, so that gets us
16K fps.  We don't need to look at every flow in the queue 8
times a second, so we'll want to do some fraction of the queue
so that we see them all, maybe 2 times a second, if we can.

   Now when the queue is large, if we can get through the whole
queue every 16 seconds, we're doing fine.  What this means is that
once a flow has closed, or become idle, it will take at most
16 seconds for that record to make its way out of Argus.
This is a reasonable number, but some may want it to be sooner.
With 2048 flows per turn, 8 turns a second, this gets us up to
256K (262144) flows per 16 seconds. (I like binary numbers, what
can I say ;o)

   Ok, so when we get above 256K flows, we'll need to do one of
three things.

          Solution                          Impact
   1. no change in processing         increase memory use
   2. process more per turn           increase packet loss
   3. delete flows earlier            changes argus behavior

Solution #1 doesn't do it for me.  If we don't go any higher
than 2048 fps, then the queue will get larger, because new flows
will be coming in faster than we'll be getting rid of them.  This
threatens the whole thing, as Argus will exit if it can't allocate
memory. 

Solution #2 is nice.  We can increase the number of flows we
process per turn, and that will cause us to lose packets.  Accuracy
goes down, but we don't run out of memory.

The 3rd solution is to delete records to bring the queue size
down, either by shortening timeout values, or through "random discard".
This is probably the best solution, but it changes the way argus
behaves, and that may confuse client programs, people, etc ....
also, changes are somewhat complex, because we will delete active
flows, which will cause use to go through more memory alloc and
free's , .....
  
All have their impacts.  Any opinions?  My guess is that
we can do strategy #2 for Argus-2.0 and put in the discard
strategy if it is attractive in Argus-2.1.

Suggestions, opinions?
Sorry for the length of the mail.

Carter

Carter Bullard
QoSient, LLC
300 E. 56th Street, Suite 18K
New York, New York  10022

carter at qosient.com
Phone +1 212 588-9133
Fax   +1 212 588-9134

   

> -----Original Message-----
> From: Scott A. McIntyre [mailto:scott at xs4all.nl]
> Sent: Thursday, February 22, 2001 5:42 AM
> To: Carter Bullard
> Subject: Re: BPF tweak; negative impact?
> 
> 
> 
> 
> Carter,
> 
> You may find these statistics interessting, at least from a platform
> point of view.  After a number of kernel tweaks, as well as your
> modifications for ArgusModeler and ArgusUtil, I see the following data
> on a FreeBSD box:
> 
> 22 Feb 01 11:31:44    man  pkts    511875  bytes    191153492 
>  drops 19496  flows    111635    closed       21284       CON
> 22 Feb 01 11:32:44    man  pkts    500470  bytes    182602171 
>  drops 24115  flows    111103    closed       19925       CON
> 22 Feb 01 11:33:44    man  pkts    502983  bytes    178531118 
>  drops 15175  flows    111163    closed       20212       CON
> 22 Feb 01 11:34:44    man  pkts    525180  bytes    199970219 
>  drops 27454  flows    111369    closed       20805       CON
> 22 Feb 01 11:35:44    man  pkts    563226  bytes    251351926 
>  drops 38399  flows    111961    closed       20593       CON
> 22 Feb 01 11:36:44    man  pkts    518892  bytes    200339374 
>  drops 21098  flows    112588    closed       21954       CON
> 
> As you can see, a considerable number of drops.
> 
> However, on a Linux, 2.4.1 kernel, box, running the unmodified argus
> beta6 code, in conjunction with no special tweaks to the 
> kernel, I see:
> 
> 22 Feb 01 11:31:23    man  pkts    578798  bytes    250947641 
>  drops 0  flows    109448    closed       22448       CON
> 22 Feb 01 11:32:23    man  pkts    542846  bytes    246244317 
>  drops 0  flows    110807    closed       21004       CON
> 22 Feb 01 11:33:23    man  pkts    558492  bytes    262522469 
>  drops 0  flows    109383    closed       23330       CON
> 22 Feb 01 11:34:23    man  pkts    636428  bytes    286972665 
>  drops 0  flows    109640    closed       25048       CON
> 22 Feb 01 11:35:23    man  pkts    671920  bytes    310826036 
>  drops 0  flows    110330    closed       25258       CON
> 22 Feb 01 11:36:23    man  pkts    698575  bytes    339087402 
>  drops 0  flows    110937    closed       27996       CON
> 
> 
> Now, it's of course possible that linux is lying, that it really is
> dropping, but if that isn't the case, this does present an interesting
> tidbit to ponder when deciding what OS I'll be implementing future
> sensors on.  I was thinking more FreeBSD, but with stats like that, I
> don't think I can justify it.
> 
> Interesstingly, the argusarchive script dies on FreeBSD due to being
> unable to perform the RASORT -- memory limitations.  The file 
> sizes are
> the same on the linux box, but for whatever reason, completes there
> without incident.
> 
> Scott
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20010222/76cd9ca7/attachment.html>


More information about the argus mailing list