Argus tweaking and design considerations
Carter Bullard
carter at qosient.com
Thu Feb 22 09:51:06 EST 2001
Hey Scott (et al),
I'm forwarding this to the mailing list, as guys like Peter
will want to see it. The Linux performance you see is also my
experience, that it does well with packet throughput.
The tweaks we are making to ArgusModeler and ArgusUtil are
important, so in the FreeBSD situation we need to find a good
set of numbers. I'll elaborate on the concepts here so that
we can all contribute.
Argus has a single circular flow queue. The queue is
sorted in arrival time order (we add to the end of the queue).
As packets come in, we find the flow using a decent hash, if
it doesn't exist we add a new flow to the end of the queue,
then we check to see if we need to write its contents out
if its status timers say to, and then we update its values,
leaving it in the queue.
When flows are active, they tend to manage themselves.
When they are done, or are relatively idle, the flows are
visited by a queue processor that runs periodically, looking
at everything in the queue to see if it needs to be written
out, or if it needs to be deleted.
Currently in beta.6, the queue processor runs once a second.
We get a packet or timeout from the packet queue and we discover
that its time to go "groom" the queue. Because we have to get
back to the packet queue pretty quick so that we don't drop
packets, we can only process so many flows at a time. Currently,
beta.6 will process the whole queue, up to 2048, and then it will
process some fraction of the queue. These values are probably
causing your packet loss problems.
The tweaks I am trying cause us to process the queue more
often. I'm shooting for maybe as high as 8 times a second, and
at each turn, we should process some flows. We can process 2048
flows a turn without much load on the machine, so that gets us
16K fps. We don't need to look at every flow in the queue 8
times a second, so we'll want to do some fraction of the queue
so that we see them all, maybe 2 times a second, if we can.
Now when the queue is large, if we can get through the whole
queue every 16 seconds, we're doing fine. What this means is that
once a flow has closed, or become idle, it will take at most
16 seconds for that record to make its way out of Argus.
This is a reasonable number, but some may want it to be sooner.
With 2048 flows per turn, 8 turns a second, this gets us up to
256K (262144) flows per 16 seconds. (I like binary numbers, what
can I say ;o)
Ok, so when we get above 256K flows, we'll need to do one of
three things.
Solution Impact
1. no change in processing increase memory use
2. process more per turn increase packet loss
3. delete flows earlier changes argus behavior
Solution #1 doesn't do it for me. If we don't go any higher
than 2048 fps, then the queue will get larger, because new flows
will be coming in faster than we'll be getting rid of them. This
threatens the whole thing, as Argus will exit if it can't allocate
memory.
Solution #2 is nice. We can increase the number of flows we
process per turn, and that will cause us to lose packets. Accuracy
goes down, but we don't run out of memory.
The 3rd solution is to delete records to bring the queue size
down, either by shortening timeout values, or through "random discard".
This is probably the best solution, but it changes the way argus
behaves, and that may confuse client programs, people, etc ....
also, changes are somewhat complex, because we will delete active
flows, which will cause use to go through more memory alloc and
free's , .....
All have their impacts. Any opinions? My guess is that
we can do strategy #2 for Argus-2.0 and put in the discard
strategy if it is attractive in Argus-2.1.
Suggestions, opinions?
Sorry for the length of the mail.
Carter
Carter Bullard
QoSient, LLC
300 E. 56th Street, Suite 18K
New York, New York 10022
carter at qosient.com
Phone +1 212 588-9133
Fax +1 212 588-9134
> -----Original Message-----
> From: Scott A. McIntyre [mailto:scott at xs4all.nl]
> Sent: Thursday, February 22, 2001 5:42 AM
> To: Carter Bullard
> Subject: Re: BPF tweak; negative impact?
>
>
>
>
> Carter,
>
> You may find these statistics interessting, at least from a platform
> point of view. After a number of kernel tweaks, as well as your
> modifications for ArgusModeler and ArgusUtil, I see the following data
> on a FreeBSD box:
>
> 22 Feb 01 11:31:44 man pkts 511875 bytes 191153492
> drops 19496 flows 111635 closed 21284 CON
> 22 Feb 01 11:32:44 man pkts 500470 bytes 182602171
> drops 24115 flows 111103 closed 19925 CON
> 22 Feb 01 11:33:44 man pkts 502983 bytes 178531118
> drops 15175 flows 111163 closed 20212 CON
> 22 Feb 01 11:34:44 man pkts 525180 bytes 199970219
> drops 27454 flows 111369 closed 20805 CON
> 22 Feb 01 11:35:44 man pkts 563226 bytes 251351926
> drops 38399 flows 111961 closed 20593 CON
> 22 Feb 01 11:36:44 man pkts 518892 bytes 200339374
> drops 21098 flows 112588 closed 21954 CON
>
> As you can see, a considerable number of drops.
>
> However, on a Linux, 2.4.1 kernel, box, running the unmodified argus
> beta6 code, in conjunction with no special tweaks to the
> kernel, I see:
>
> 22 Feb 01 11:31:23 man pkts 578798 bytes 250947641
> drops 0 flows 109448 closed 22448 CON
> 22 Feb 01 11:32:23 man pkts 542846 bytes 246244317
> drops 0 flows 110807 closed 21004 CON
> 22 Feb 01 11:33:23 man pkts 558492 bytes 262522469
> drops 0 flows 109383 closed 23330 CON
> 22 Feb 01 11:34:23 man pkts 636428 bytes 286972665
> drops 0 flows 109640 closed 25048 CON
> 22 Feb 01 11:35:23 man pkts 671920 bytes 310826036
> drops 0 flows 110330 closed 25258 CON
> 22 Feb 01 11:36:23 man pkts 698575 bytes 339087402
> drops 0 flows 110937 closed 27996 CON
>
>
> Now, it's of course possible that linux is lying, that it really is
> dropping, but if that isn't the case, this does present an interesting
> tidbit to ponder when deciding what OS I'll be implementing future
> sensors on. I was thinking more FreeBSD, but with stats like that, I
> don't think I can justify it.
>
> Interesstingly, the argusarchive script dies on FreeBSD due to being
> unable to perform the RASORT -- memory limitations. The file
> sizes are
> the same on the linux box, but for whatever reason, completes there
> without incident.
>
> Scott
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20010222/76cd9ca7/attachment.html>
More information about the argus
mailing list