Multi-Instanced Argus
Reynolds, Jeffrey
JReynolds at utdallas.edu
Thu Mar 27 23:26:18 EDT 2014
Craig,
Thanks for your response. Up front, I originally believed that I would need to have multiple argus instances to support our throughput, however I clearly see now that is not the case. If you have experience running a single argus instance capable of capturing between 3 and 7 Gbps, then I take we'll be fine with one as well. I've reloaded my kernel modules with a few different parameters, namely the ones you gave, and I'm always able to get packets to register correctly with pfcount and n2disk, but so far argus seems to simply not see any traffic. I've tried running argus with then -D 8 parameter and so far I've gotten no data, nor any explanation as to why. I'm wondering if I botched something in the source when I modified it to add the dna device name, or if there aren't additional modifications to make. Thought argus doesn't seem to have a problem with "dna0", it did throw an error to /var/log/messages when I tried to specify "dnacluster:10" as an interface.
Anyways, I'll wail at it more tomorrow and see if I can't come up with anything.
Thanks!
Jeff
________________________________________
From: Craig Merchant [cmerchant at responsys.com]
Sent: Thursday, March 27, 2014 7:23 PM
To: Reynolds, Jeffrey
Cc: Argus
Subject: RE: [ARGUS] Multi-Instanced Argus
Hey, Jeffrey...
The configuration questions for the pf_ring and ixgbe drivers may be better answered on the ntop forums... But I'll do my best. Here is how I load the drivers:
insmod /lib/modules/2.6.32-220.el6.x86_64/updates/pf_ring.ko
/sbin/modprobe ixgbe MQ=0,0 RSS=1,1 num_rx_slots=32768
ifconfig dna0 up promisc
ethtool -K dna0 tso off
ethtool -K dna0 gro off
ethtool -K dna0 lro off
ethtool -K dna0 gso off
ethtool -G dna0 tx 32768
ethtool -G dna0 rx 32768
One thing I'm not clear on from your config is why you are using pfdnacluster_master at all... That daemon is designed to split up flows and/or make copies of them to distribute to other applications. I don't think it's meant to aggregate two interfaces into one stream. Normally it's run with a -n parameter to tell it how many queues you want traffic divided up into. We use:
pfdnacluster_master -d -c 10 -n 28,1 -m 0 -i dna0
In this case, -n says "divide up one copy of the traffic into 28 queues" and "create one copy of all the traffic on the last queue". The apps accessing the first 28 queues (Snort) would connect to dnacluster:10 at 0 - dnacluster:10 at 27. Argus connects to dnacluster:10 at 28 and would see a copy of all of the traffic.
If all you are looking to do is combine traffic from two interfaces into one, why not just run argus with -i dna0,dna1?
For testing, I would try the following to see where you might be having problems:
pfcount -i dna0
pfcount -i dna1
pfcount -i dna0,dna1
pfcount -i dnacluster:10
pfcount -i dnacluster:10 at 0
Let me know if that helps...
Craig
-----Original Message-----
From: Reynolds, Jeffrey [mailto:JReynolds at utdallas.edu]
Sent: Thursday, March 27, 2014 3:18 PM
To: Craig Merchant; Carter Bullard
Cc: Argus
Subject: Re: [ARGUS] Multi-Instanced Argus
So I understand this is from a while ago, but here is what I have. Craig, maybe you can show me how I'm doing it wrong.
I finally got PF_Ring and libzero licensed correctly so that pfdnacluster isn't limited to 5 minutes of capture. I downloaded the Argus source, installed the dependencies, and compiled after making the change you noted below. However, I don't seem to be properly attaching argus to my devices to allow it to capture. I have a feeling its something to do with my PF_Ring or dna-ixgbe conf files. We have two interfaces to monitor, which I've previously combined into one by using pfdnacluster_master. However, it looks like I can't get Argus to hook into that or a single dan interface. Anyway, after make installing, I run the following command with the following result:
#pfdnacluster_master -i dna0,dna1 -c 10
#argus -i dnacluster:10 -s 1500 -w /var/data/argus-out
My /var/log/messages says that the specified interface doesn't exist, which I kind of expected.
So I tried this (without pfdnacluster running):
#argus -i dna0 -s 1500 -w /var/data/argus-out
This time argus appears to have started, but my output file is not growing (it initial starts at 128 bytes and increases by that same amount every 30 seconds or so).
In case this happens to be the parameters I'm loading with my kernel modules, here they are:
pf_ring.ko transparenet_mode=2
(I've also tried 0, with similar results) ixgbe.ko RSS=1,1,1,1 (I wasn't seeing all of the traffic from my interfaces with the default config, the ntop folks recommended this, I need to dig further into the docs to learn more about these parameters).
To answer your original question, I'm only monitoring about ~2Gbps, significantly less then you are. I'm not sure if what I've noticed would be considered "gaps", but we do see exchanges where the server appears to initiate conversations by sending a response to a client, which the client doesn't appear to have requested. I'm guess the missing request was most likely a packet that didn't get captured.
Any configuration suggestions would be much appreciated.
Thanks,
Jeff
From: Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>>
Date: Wednesday, March 12, 2014 at 6:39 PM
To: Carter Bullard <carter at qosient.com<mailto:carter at qosient.com>>, Jeff Reynolds <jjr140030 at utdallas.edu<mailto:jjr140030 at utdallas.edu>>
Cc: Argus <argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>>
Subject: RE: [ARGUS] Multi-Instanced Argus
We're running Argus and Snort of PF_RING's DNA/Libzero drivers. We decided to use Libzero because the standard DNA drivers limit the number of memory "queues" containing network traffic to 16. Each queue can only be accessed by a single process and our sensors have 32 cores, so we wouldn't be able to run the maximum number of Snort instances without it.
We use the pfdnaclustermaster app to spread flows across 28 queues for snort and also maintain a copy of all flows in a queue for Argus.
To get it to work, all I had to do was make a slight edit to ArgusSource.c so that Argus would recognize DNA/Libzero queues as a valid interface.
Somewhere around line 4191 (for argus 3.0.7):
- if ((strstr(device->name, "dag")) || (strstr(device->name, "napa"))) {
+ if (strstr(device->name, "dag") || strstr(device->name, "nap") ||
+ strstr(device->name, "dna") || (strstr(device->name, "eth") &&
+ strstr(device->name, "@"))) {
Our data centers do around 4-8 Gbps 24/7. From what I recall, there is (or was) a bug in PF_RING that caused Argus to run at 100% all of the time, but in my experience Argus wasn't having problems keeping up with our volume of data. We did see an unusually high number of flows that Argus couldn't determine the direction of, but we weren't seeing gaps in the packets or anything else to suggest that Argus couldn't handle the volume.
How much traffic are you sending at Argus? Have you tried searching your Argus records for flows that have gaps in them? That would be a pretty good indicator that Argus may have trouble keeping up. Or that your SPAN port can't handle the load...
Thx.
Craig
From: argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu<mailto:argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu> [mailto:argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu] On Behalf Of Carter Bullard
Sent: Wednesday, March 12, 2014 1:57 PM
To: Reynolds, Jeffrey
Cc: Argus
Subject: Re: [ARGUS] Multi-Instanced Argus
Hey Jeffery,
Good so far. This seem like the link for accelerating snort with PF_RING DNA ??
http://www.ntop.org/pf_ring/accelerating-snort-with-pf_ring-dna/
I'm interested in the symmetric RSS and if it works properly.
Are you running the PF_RING DNA DAQ ????
It would seem that we'll have to modify argus to use this facility ???
Carter
On Mar 12, 2014, at 3:26 PM, Reynolds, Jeffrey <JReynolds at utdallas.edu<mailto:JReynolds at utdallas.edu>> wrote:
First, before we dive into to it too deep, how is the performance ??
This actually seems like a great place to start. Before getting too heavy into PF_RING integration, maybe I should offer a bit of backstory. Our main goal is just to archive traffic. We have a server running CentOS 6 that receives traffic from two SPAN ports. The only thing we want to accomplish is to maintain a copy of that traffic for some period of time. Argus was used because it seemed to be the best tool for the price, and it comes with a lot of great features that while we may not use now, we may use later (again, for right now all we want is a copy of the traffic to be able to perform forensics on).
Now, I put up a single instance of Argus and pointed it at the interface that was the master of our two bonded physical NICs (eth0 and eth1 are bonded to bond0). I let it run for an hour to get some preliminary numbers. I ran an recount against my output file and got the following stats:
racount -t 2014y3m12d05h -r argus-out
racount records total_pkts src_pkts dst_pkts total_bytes src_bytes dst_bytes sum 14236180 187526800 98831765 88695035 212079839908 102889789820 109190050088
However, the switch the switch sending that traffic reported that it had sent a total of 421,978,297 packets to both interfaces, and a total of 371,307,051,815 bytes for that time frame. I could be interpreting something incorrectly, so maybe the best first thing for me to confirm is that we are in fact losing a lot of traffic. But it seems that a single argus instance can't keep up with the traffic. I've seen this happen with Snort, and our solution was to plug Snort into PF_RING to allow the traffic to be intelligently forwarded via the Snort Data Acquisition Library (DAQ). From the perspective of someone who hasn't had a lot of exposure to this level of hardware configuration, it was relatively easy to plug the configuration parameters in at the Snort command line to have them all point at the same traffic source so that each individual process didn't run through the same traffic. My hope was that there might just be some parameters to set within the argus.conf file which would tell each process to pull from a single PF_RING source. However, it looks like this might not be as easy as I had once thought.
Am I on the right track or does this make even a little sense?
Thanks,
Jeff
From: Carter Bullard <carter at qosient.com<mailto:carter at qosient.com><mailto:carter at qosient.com>>
Date: Wednesday, March 12, 2014 at 9:54 AM
To: "Reynolds, Jeffrey" <JReynolds at utdallas.edu<mailto:JReynolds at utdallas.edu><mailto:JReynolds at utdallas.edu>>
Cc: Argus <argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu><mailto:argus-info at lists.andrew.cmu.edu>>
Subject: Re: [ARGUS] Multi-Instanced Argus
Hey Jeffrey,
I am very interested in this approach, but I have no experience with this PF_RING feature, so I'll have to give you the "design response". Hopefully, we can get this to where its doing exactly what anyone would want it to do, and get us a really fast argus, on the cheap.
First, before we dive into to it too deep, how is the performance ?? Are you getting bi-directional flows out of this scheme ?? Are you seeing all the traffic ??? If so, then congratulations !!! If the performance is good, your seeing all the traffic, but you're only getting uni-directional flows, then we may have some work to do, but still congratulations !!! If you're not getting all the traffic then we have some real work to do, as one of the purposes of argus is to monitor all the traffic.
OK, so my understanding is that the PF_RING can do some packet routing to a non-overlapping set of tap interfaces. Routing is based on some classification scheme, designed to make this usable. The purpose is to provide coarse grain parallelism for packet processing. The idea, as much as I can tell, is to prevent multiple readers from having to read from the same queue; eliminating locking issues, which kills performance etc...
So, I'm not sure what you mean by "pulling from the same queue". If you do have multiple argi reading the same packet, you will end up counting a single packet multiple times. Not a terrible thing, but not recommended. Its not that you're creating multiple observation domains using this PF_RING technique. You're really splitting a single packet observation domain into a multi-sensor facility ... eventually you will want to combine the total argus output into a single output stream, that represents the single packet observation domain. At least that is my thinking, and I would recommend that you use radium to connect to all of your argus instances, rather than writing the argus output to a set of files. Radium will generate a single argus data output stream, representing the argus data from the single observation domain.
The design issue of using the PF_RING function is "how is PF_RING classifying packets to do the routing?".
We would like for it to send packets that belong to the same bi-directional flow to the same virtual interface, so argus can do its bi-directional thing. PF_RING claims that you can provide your own classifier logic, which we can do to make this happen. We have a pretty fast bidirectional hashing scheme which we can try out.
We have a number of people that are using netmap instead of PF_RING. My understanding is that it also has this same type of feature. If we can get some people talking about that, that would help a bit.
Carter
On Mar 12, 2014, at 1:03 AM, Reynolds, Jeffrey <JReynolds at utdallas.edu<mailto:JReynolds at utdallas.edu><mailto:JReynolds at utdallas.edu>> wrote:
Howdy All,
So after forever and a day, I've finally found time to start working on my multi-instanced argus configuration. Here is my setup:
-CentOS 6.5 x64
-pfring driver compiled from source
-pfring capable Intel NICs (currently using the ixgbe driver version 3.15.1-k) (these NICs are in a bonded configuration under a device named bond0)
I've configured my startup script to start 5 instances of Argus, each with there own /etc/argusX.conf file (argus1.conf, argus2.conf, etc). The start up script correctly assigns the proper pid file to each instance, and everything starts and stops smoothly. Each instance is writing an output file to /var/argus in the format of argusX.out. When I first tried running my argus instances, I ran them with a version of PF_RING I had installed from an RPM obtained from the ntop repo. Things didn't seem to work correctly, so I tried again after I had compiled from source. After compiling from source, I got the following output in /var/log/messages when I started argus:
Mar 11 17:48:16 argus kernel: No module found in object Mar 11 17:49:16 argus kernel: [PF_RING] Welcome to PF_RING 5.6.3 ($Revision: 7358$) Mar 11 17:49:16 argus kernel: (C) 2004-14 ntop.org<http://ntop.org/><http://ntop.org<http://ntop.org/>>
Mar 11 17:49:16 argus kernel: [PF_RING] registered /proc/net/pf_ring/ Mar 11 17:49:16 argus kernel: NET: Registered protocol family 27 Mar 11 17:49:16 argus kernel: [PF_RING] Min # ring slots 4096
Mar 11 17:49:16 argus kernel: [PF_RING] Slot version 15
Mar 11 17:49:16 argus kernel: [PF_RING] Capture TX Yes [RX+TX]
Mar 11 17:49:16 argus kernel: [PF_RING] Transparent Mode 0
Mar 11 17:49:16 argus kernel: [PF_RING] IP Defragment No
Mar 11 17:49:16 argus kernel: [PF_RING] Initialized correctly Mar 11 17:49:35 argus kernel: Bluetooth: Core ver 2.15 Mar 11 17:49:35 argus kernel: NET: Registered protocol family 31 Mar 11 17:49:35 argus kernel: Bluetooth: HCI device and connection manager initialized Mar 11 17:49:35 argus kernel: Bluetooth: HCI socket layer initialized Mar 11 17:49:35 argus kernel: Netfilter messages via NETLINK v0.30.
Mar 11 17:49:35 argus argus[13918]: 11 Mar 14 17:49:35.643243 started Mar 11 17:49:35 argus argus[13918]: 11 Mar 14 17:49:35.693930 started Mar 11 17:49:35 argus kernel: device bond0 entered promiscuous mode Mar 11 17:49:35 argus kernel: device em1 entered promiscuous mode Mar 11 17:49:35 argus kernel: device em2 entered promiscuous mode Mar 11 17:49:35 argus argus[13918]: 11 Mar 14 17:49:35.721490 ArgusGetInterfaceStatus: interface bond0 is up Mar 11 17:49:36 argus argus[13922]: 11 Mar 14 17:49:36.349202 started Mar 11 17:49:36 argus argus[13922]: 11 Mar 14 17:49:36.364625 started Mar 11 17:49:36 argus argus[13922]: 11 Mar 14 17:49:36.383623 ArgusGetInterfaceStatus: interface bond0 is up Mar 11 17:49:37 argus argus[13926]: 11 Mar 14 17:49:37.045224 started Mar 11 17:49:37 argus argus[13926]: 11 Mar 14 17:49:37.060689 started Mar 11 17:49:37 argus argus[13926]: 11 Mar 14 17:49:37.079706 ArgusGetInterfaceStatus: interface bond0 is up Mar 11 17:49:37 argus argus[13930]: 11 Mar 14 17:49:37.753278 started Mar 11 17:49:37 argus argus[13930]: 11 Mar 14 17:49:37.768613 started Mar 11 17:49:37 argus argus[13930]: 11 Mar 14 17:49:37.785691 ArgusGetInterfaceStatus: interface bond0 is up Mar 11 17:49:38 argus argus[13934]: 11 Mar 14 17:49:38.449229 started Mar 11 17:49:38 argus argus[13934]: 11 Mar 14 17:49:38.466365 started Mar 11 17:49:38 argus argus[13934]: 11 Mar 14 17:49:38.485675 ArgusGetInterfaceStatus: interface bond0 is up
Aside from the "No module found in object" error, everything seems like its working Ok. The only problem is that I don't seem to have my argus instances configured to pull traffic from the same queue. In other words, I have five output files from five argus instances with like traffic in all of them. I haven't made any changes to my argus config files, aside from telling them to write to different locations and the name of the interface. I know I'm missing something but I'm not quite sure what it is. If someone might be able to tell me how to configure these five instances to pull from the same PF_RING queue, I'd be mighty obliged. Let me know if I need to submit any additional information.
Thanks,
Jeff Reynolds
More information about the argus
mailing list