Multi-Instanced Argus

Craig Merchant cmerchant at responsys.com
Wed Mar 12 19:39:34 EDT 2014


We're running Argus and Snort of PF_RING's DNA/Libzero drivers.  We decided to use Libzero because the standard DNA drivers limit the number of memory "queues" containing network traffic to 16.  Each queue can only be accessed by a single process and our sensors have 32 cores, so we wouldn't be able to run the maximum number of Snort instances without it.

We use the pfdnaclustermaster app to spread flows across 28 queues for snort and also maintain a copy of all flows in a queue for Argus.

To get it to work, all I had to do was make a slight edit to ArgusSource.c so that Argus would recognize DNA/Libzero queues as a valid interface.

Somewhere around line 4191 (for argus 3.0.7):


-   if ((strstr(device->name, "dag")) || (strstr(device->name, "napa"))) {

+ if (strstr(device->name, "dag") || strstr(device->name, "nap") || strstr(device->name, "dna") || (strstr(device->name, "eth") && strstr(device->name, "@"))) {

Our data centers do around 4-8 Gbps 24/7.  From what I recall, there is (or was) a bug in PF_RING that caused Argus to run at 100% all of the time, but in my experience Argus wasn't having problems keeping up with our volume of data.  We did see an unusually high number of flows that Argus couldn't determine the direction of, but we weren't seeing gaps in the packets or anything else to suggest that Argus couldn't handle the volume.

How much traffic are you sending at Argus?  Have you tried searching your Argus records for flows that have gaps in them?  That would be a pretty good indicator that Argus may have trouble keeping up.  Or that your SPAN port can't handle the load...

Thx.

Craig

From: argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu [mailto:argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu] On Behalf Of Carter Bullard
Sent: Wednesday, March 12, 2014 1:57 PM
To: Reynolds, Jeffrey
Cc: Argus
Subject: Re: [ARGUS] Multi-Instanced Argus

Hey Jeffery,
Good so far.   This seem like the link for accelerating snort with PF_RING DNA ??
http://www.ntop.org/pf_ring/accelerating-snort-with-pf_ring-dna/

I'm interested in the symmetric RSS and if it works properly.
Are you running the PF_RING DNA DAQ ????

It would seem that we'll have to modify argus to use this facility ???

Carter

On Mar 12, 2014, at 3:26 PM, Reynolds, Jeffrey <JReynolds at utdallas.edu<mailto:JReynolds at utdallas.edu>> wrote:


First, before we dive into to it too deep, how is the performance ??

This actually seems like a great place to start.  Before getting too heavy into PF_RING integration, maybe I should offer a bit of backstory.  Our main goal is just to archive traffic.  We have a server running CentOS 6 that receives traffic from two SPAN ports.  The only thing we want to accomplish is to maintain a copy of that traffic for some period of time.  Argus was used because it seemed to be the best tool for the price, and it comes with a lot of great features that while we may not use now, we may use later (again, for right now all we want is a copy of the traffic to be able to perform forensics on).

Now, I put up a single instance of Argus and pointed it at the interface that was the master of our two bonded physical NICs (eth0 and eth1 are bonded to bond0).  I let it run for an hour to get some preliminary numbers.  I ran an recount against my output file and got the following stats:

racount -t 2014y3m12d05h -r argus-out
racount records total_pkts src_pkts dst_pkts total_bytes src_bytes dst_bytes
sum 14236180 187526800 98831765 88695035 212079839908 102889789820 109190050088

However, the switch the switch sending that traffic reported that it had sent a total of 421,978,297 packets to both interfaces, and a total of 371,307,051,815 bytes for that time frame.  I could be interpreting something incorrectly, so maybe the best first thing for me to confirm is that we are in fact losing a lot of traffic.  But it seems that a single argus instance can't keep up with the traffic.  I've seen this happen with Snort, and our solution was to plug Snort into PF_RING to allow the traffic to be intelligently forwarded via the Snort Data Acquisition Library (DAQ).  From the perspective of someone who hasn't had a lot of exposure to this level of hardware configuration, it was relatively easy to plug the configuration parameters in at the Snort command line to have them all point at the same traffic source so that each individual process didn't run through the same traffic.  My hope was that there might just be some parameters to set within the argus.conf file which would tell each process to pull from a single PF_RING source.  However, it looks like this might not be as easy as I had once thought.

Am I on the right track or does this make even a little sense?

Thanks,

Jeff



From: Carter Bullard <carter at qosient.com<mailto:carter at qosient.com><mailto:carter at qosient.com>>
Date: Wednesday, March 12, 2014 at 9:54 AM
To: "Reynolds, Jeffrey" <JReynolds at utdallas.edu<mailto:JReynolds at utdallas.edu><mailto:JReynolds at utdallas.edu>>
Cc: Argus <argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu><mailto:argus-info at lists.andrew.cmu.edu>>
Subject: Re: [ARGUS] Multi-Instanced Argus

Hey Jeffrey,
I am very interested in this approach, but I have no experience with this PF_RING feature, so I'll have to give you the "design response".  Hopefully, we can get this to where its doing exactly what anyone would want it to do, and get us a really fast argus, on the cheap.

First, before we dive into to it too deep, how is the performance ??  Are you getting bi-directional flows out of this scheme ??  Are you seeing all the traffic ???  If so, then congratulations !!!  If the performance is good, your seeing all the traffic, but you're only getting uni-directional flows, then we may have some work to do, but still congratulations !!!  If you're not getting all the traffic then we have some real work to do, as one of the purposes of argus is to monitor all the traffic.

OK, so my understanding is that the PF_RING can do some packet routing to a non-overlapping set of tap interfaces.  Routing is based on some classification scheme, designed to make this usable. The purpose is to provide coarse grain parallelism for packet processing.  The idea, as much as I can tell, is to prevent multiple readers from having to read from the same queue; eliminating locking issues, which kills performance etc...

So, I'm not sure what you mean by "pulling from the same queue".  If you do have multiple argi reading the same packet, you will end up counting a single packet multiple times.  Not a terrible thing, but not recommended.  Its not that you're creating multiple observation domains using this PF_RING technique. You're really splitting a single packet observation domain into a multi-sensor facility ... eventually you will want to combine the total argus output into a single output stream, that represents the single packet observation domain.  At least that is my thinking, and I would recommend that you use radium to connect to all of your argus instances, rather than writing the argus output to a set of files.  Radium will generate a single argus data output stream, representing the argus data from the single observation domain.

The design issue of using the PF_RING function is "how is PF_RING classifying packets to do the routing?".
We would like for it to send packets that belong to the same bi-directional flow to the same virtual interface, so argus can do its bi-directional thing.  PF_RING claims that you can provide your own classifier logic, which we can do to make this happen.  We have a pretty fast bidirectional hashing scheme which we can try out.

We have a number of people that are using netmap instead of PF_RING.  My understanding is that it also has this same type of feature.  If we can get some people talking about that, that would help a bit.

Carter



On Mar 12, 2014, at 1:03 AM, Reynolds, Jeffrey <JReynolds at utdallas.edu<mailto:JReynolds at utdallas.edu><mailto:JReynolds at utdallas.edu>> wrote:

Howdy All,

So after forever and a day, I've finally found time to start working on my multi-instanced argus configuration. Here is my setup:

-CentOS 6.5 x64
-pfring driver compiled from source
-pfring capable Intel NICs (currently using the ixgbe driver version 3.15.1-k)
(these NICs are in a bonded configuration under a device named bond0)

I've configured my startup script to start 5 instances of Argus, each with there own /etc/argusX.conf file (argus1.conf, argus2.conf, etc).  The start up script correctly assigns the proper pid file to each instance, and everything starts and stops smoothly.  Each instance is writing an output file to /var/argus in the format of argusX.out.  When I first tried running my argus instances, I ran them with a version of PF_RING I had installed from an RPM obtained from the ntop repo.  Things didn't seem to work correctly, so I tried again after I had compiled from source.  After compiling from source, I got the following output in /var/log/messages when I started argus:

Mar 11 17:48:16 argus kernel: No module found in object
Mar 11 17:49:16 argus kernel: [PF_RING] Welcome to PF_RING 5.6.3 ($Revision: 7358$)
Mar 11 17:49:16 argus kernel: (C) 2004-14 ntop.org<http://ntop.org/><http://ntop.org<http://ntop.org/>>
Mar 11 17:49:16 argus kernel: [PF_RING] registered /proc/net/pf_ring/
Mar 11 17:49:16 argus kernel: NET: Registered protocol family 27
Mar 11 17:49:16 argus kernel: [PF_RING] Min # ring slots 4096
Mar 11 17:49:16 argus kernel: [PF_RING] Slot version     15
Mar 11 17:49:16 argus kernel: [PF_RING] Capture TX       Yes [RX+TX]
Mar 11 17:49:16 argus kernel: [PF_RING] Transparent Mode 0
Mar 11 17:49:16 argus kernel: [PF_RING] IP Defragment    No
Mar 11 17:49:16 argus kernel: [PF_RING] Initialized correctly
Mar 11 17:49:35 argus kernel: Bluetooth: Core ver 2.15
Mar 11 17:49:35 argus kernel: NET: Registered protocol family 31
Mar 11 17:49:35 argus kernel: Bluetooth: HCI device and connection manager initialized
Mar 11 17:49:35 argus kernel: Bluetooth: HCI socket layer initialized
Mar 11 17:49:35 argus kernel: Netfilter messages via NETLINK v0.30.
Mar 11 17:49:35 argus argus[13918]: 11 Mar 14 17:49:35.643243 started
Mar 11 17:49:35 argus argus[13918]: 11 Mar 14 17:49:35.693930 started
Mar 11 17:49:35 argus kernel: device bond0 entered promiscuous mode
Mar 11 17:49:35 argus kernel: device em1 entered promiscuous mode
Mar 11 17:49:35 argus kernel: device em2 entered promiscuous mode
Mar 11 17:49:35 argus argus[13918]: 11 Mar 14 17:49:35.721490 ArgusGetInterfaceStatus: interface bond0 is up
Mar 11 17:49:36 argus argus[13922]: 11 Mar 14 17:49:36.349202 started
Mar 11 17:49:36 argus argus[13922]: 11 Mar 14 17:49:36.364625 started
Mar 11 17:49:36 argus argus[13922]: 11 Mar 14 17:49:36.383623 ArgusGetInterfaceStatus: interface bond0 is up
Mar 11 17:49:37 argus argus[13926]: 11 Mar 14 17:49:37.045224 started
Mar 11 17:49:37 argus argus[13926]: 11 Mar 14 17:49:37.060689 started
Mar 11 17:49:37 argus argus[13926]: 11 Mar 14 17:49:37.079706 ArgusGetInterfaceStatus: interface bond0 is up
Mar 11 17:49:37 argus argus[13930]: 11 Mar 14 17:49:37.753278 started
Mar 11 17:49:37 argus argus[13930]: 11 Mar 14 17:49:37.768613 started
Mar 11 17:49:37 argus argus[13930]: 11 Mar 14 17:49:37.785691 ArgusGetInterfaceStatus: interface bond0 is up
Mar 11 17:49:38 argus argus[13934]: 11 Mar 14 17:49:38.449229 started
Mar 11 17:49:38 argus argus[13934]: 11 Mar 14 17:49:38.466365 started
Mar 11 17:49:38 argus argus[13934]: 11 Mar 14 17:49:38.485675 ArgusGetInterfaceStatus: interface bond0 is up

Aside from the "No module found in object" error, everything seems like its working Ok.  The only problem is that I don't seem to have my argus instances configured to pull traffic from the same queue.  In other words, I have five output files from five argus instances with like traffic in all of them.  I haven't made any changes to my argus config files, aside from telling them to write to different locations and the name of the interface. I know I'm missing something but I'm not quite sure what it is.  If someone might be able to tell me how to configure these five instances to pull from the same PF_RING queue, I'd be mighty obliged.  Let me know if I need to submit any additional information.

Thanks,

Jeff Reynolds

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140312/48bf4894/attachment.html>


More information about the argus mailing list