hardware for argus with 10GB link

Dave Plonka plonka at doit.wisc.edu
Fri Apr 23 15:55:44 EDT 2010


Hi Michael,

On Fri, Apr 23, 2010 at 12:14:02PM -0700, Michael Sanderson wrote:
> We've recently had our campus network connection upgraded to 10Gb and 
> are now looking at our options for tapping that connection

This is our situation too.
What I describe below is our experiment with hardware that was made
available to me (but not chosen solely for this purpose), certainly
not a prescription...

> and getting 
> argus collecting data again.  We'll be tapping our fibre links with most 
> likely with NetOptics taps, but we're looking for suggestions of 
> appropriate 10Gb NICs for our argus "sensor" box.  We definitely won't 
> be pushing the 10Gb link initially, but I expect that we'll see periods 
> where we approach 2Gb sustained.

We see this regularly, our varying between ~2Gb at our low points
to ~5.5Gb peaks.  What I've tried, and am not happy with at present,
is a VSSMonitoring device (vssmonitoring.com) that splits the 10Gbps
optical tap out to 10 1GigE copper ports.  Then you configure the
device to monitor specific vlans or on address or protocol/port
characteristics and hash on the packets (somehow) to deliver them in
a balanced way across those copper ports.  I've tried to do either
tcpdump-based packet capture and argus (w/96-byte capture, IIRC)
across 3 machines with 4 total interfaces (each 1 GigE).
The 3 machines are Dell 1850, circa 2005, so not state of the art.
(I think they're each two dual-core 64-bit Xeon processors at 2.8GHz.)

Across 4 ethernet interfaces, I've seen 3 argus processes (one reading
two interfaces on the same machine) report a total of ~250K pps in
the resulting flow files.  However the Linux ethernet interfaces
(collected via sar/sadc) show ~350K pps during that time so there is
significant loss.

<snip>
> Does anyone recall the specific issues with interrupt coalescing and 
> argus?  There will definitely be potential for out of order TCP 
> connection startup/shutdown.  What is the impact on the flow reporting?
> 
> Are there other options to Endance's cards that solve the timestamp 
> problem, either other vendors or system/kernel configuration?

The best information I've seen on this is from Luca Deri.
In a private conversation about it and PF_RING, he mentioned that
the trick at 10G is to make sure you balance properly traffic and
interrupts across the cores so that you do not invalidate the cache,
and referred me to this paper:

  http://luca.ntop.org/MulticorePacketCapture.pdf

Personally, my strategy was to set up the tap, monitor the ethernet
statistic first, compare them to what the router reported via SNMP
polling, and then run argus writing the data to files.  I set up
some stuff to put each of those to RRD files and unfortunately what
I see is argus pkts/bytes < Linux interface pkts/butes << router pkts/bytes,
so I definitely see loss at every level.

So, not that you don't seem already to know this, but it seems to
require a lot of attention to get the high performance capture one
wants.  If someone else can provide details about a successful hardware
config (with one or multiple machines tapping the 10GigE) I'd love to
hear it.

Dave

-- 
plonka at cs.wisc.edu  http://net.doit.wisc.edu/~plonka/  Madison, WI



More information about the argus mailing list