Argus tweaking and design considerations

Thu Feb 22 11:43:51 EST 2001

> Hey Scott (et al),
>    I'm forwarding this to the mailing list, as guys like Peter
> will want to see it.  The Linux performance you see is also my
> experience, that it does well with packet throughput.
> 
>    The tweaks we are making to ArgusModeler and ArgusUtil are
> important, so in the FreeBSD situation we need to find a good
> set of numbers.  I'll elaborate on the concepts here so that
> we can all contribute.
> 
>    Argus has a single circular flow queue.  The queue is
> sorted in arrival time order (we add to the end of the queue).
> As packets come in, we find the flow using a decent hash, if
> it doesn't exist we add a new flow to the end of the queue,
> then we check to see if we need to write its contents out
> if its status timers say to, and then we update its values,
> leaving it in the queue.

	I'd guess that multiple flow queuse would be a flow classification
nightmare? That might be one way of reducing the load. Another might be
a processor that classifies the flows in to the queue and then sends the 
update over another link to the next box that actually processes the queue
(without the load of the bpf capture sucking cycles in the background). The
link overhead may limit this one, but we would presumably get a volume 
reduction from the classification in to a queue slot and it might be doable.

> 
>    When flows are active, they tend to manage themselves.
> When they are done, or are relatively idle, the flows are
> visited by a queue processor that runs periodically, looking
> at everything in the queue to see if it needs to be written
> out, or if it needs to be deleted.
> 
>    Currently in beta.6, the queue processor runs once a second.
> We get a packet or timeout from the packet queue and we discover
> that its time to go "groom" the queue.  Because we have to get
> back to the packet queue pretty quick so that we don't drop
> packets, we can only process so many flows at a time.  Currently,
> beta.6 will process the whole queue, up to 2048, and then it will
> process some fraction of the queue.  These values are probably
> causing your packet loss problems.
>

	Hmmm, perhaps (although perhaps in 2.1 :-)) we need a dual drop counter
then? One for what bpf reports on the kernel->bpf drops and another for what
argus dropped due to overwriting in the queue? I'd like to know (although I
suppose when I really want to know I can instrument both argus and the kernel
and find out :-)) who is dropping things so I can address it with more memory
or a bigger machine (or by begging for algorithm changes to increase 
performance :-)). I'd really like 3, one for what the Ethernet interface 
knows it dropped at the kernel level, but I at present don't even know how
to get that out of the kernel let alone to the argus task.

>    The tweaks I am trying cause us to process the queue more
> often.  I'm shooting for maybe as high as 8 times a second, and
> at each turn, we should process some flows.  We can process 2048
> flows a turn without much load on the machine, so that gets us
> 16K fps.  We don't need to look at every flow in the queue 8
> times a second, so we'll want to do some fraction of the queue
> so that we see them all, maybe 2 times a second, if we can.

	Would extra threads on extra processors help with this? I expect 
memory bandwith will be come a problem, but money will solve that (perhaps
an inordinate amount of money, but it is solvable). As always I'm looking to
my future when my link speeds climb. Topspeed boxes will do some of it by
allowing me to parallel process, but I expect more raw horsepower is going
to be part of the answer too. I'm lucky enough to have bosses that recognize
(if are not always able to fund) that they have to spend money on diagnostic
tools when they buy an increase in link speed.

> 
>    Now when the queue is large, if we can get through the whole
> queue every 16 seconds, we're doing fine.  What this means is that
> once a flow has closed, or become idle, it will take at most
> 16 seconds for that record to make its way out of Argus.
> This is a reasonable number, but some may want it to be sooner.
> With 2048 flows per turn, 8 turns a second, this gets us up to
> 256K (262144) flows per 16 seconds. (I like binary numbers, what
> can I say ;o)
> 
>    Ok, so when we get above 256K flows, we'll need to do one of
> three things.
> 
>           Solution                          Impact
>    1. no change in processing         increase memory use

	I favor this one. I can buy more memory (if not necessarily cheaply)
and I dislike packet loss since the packet I lose may be vital :-). Will
additional threads/CPUs to parallel process pieces of the queue allow this
to to grow with or without more memory? I expect most folks with big fast
links (and thus deep pockets!) are going to be in favor of this one.
	I understand your point that just more memory won't help if we can't
process the queue faster than average input (because memory use would be
infinite over time) which points to the requirement for faster processing of
the queue somehow (along with more memory for smoothing out the bumps). We
may be down to a topspeed like solution (i.e. an input mux that distributes
a consistant part of the input stream to multiple argi to keep the individual
argus load within bounds) but the larger we can make the individual limit the
better. The rub in here is of course classifying that "consistant part of the
input stream" at input wire speed in the mux box. To some extent this just 
moves the problem, but that may be what needs to happen.

>    2. process more per turn           increase packet loss

	Packet loss is undesirable to me.

>    3. delete flows earlier            changes argus behavior
> 

	This one is less desirable too, I of course want it all :-)
<snip>

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada