Argus 3.0.6 and dnaclusters

Craig Merchant cmerchant at responsys.com
Tue Dec 18 13:43:39 EST 2012


So far it's been working great.  PF_RING 5.5.1 still has the bug that causes argus to run at 100%, but other than that it's been working great.

What's the best way to figure out if argus is failing to keep up with the volume of traffic on a particular interface?

Thx.

Craig

-----Original Message-----
From: Carter Bullard [mailto:carter at qosient.com] 
Sent: Monday, December 17, 2012 7:21 PM
To: Chris Wakelin
Cc: Craig Merchant; Argus (argus-info at lists.andrew.cmu.edu)
Subject: Re: [ARGUS] Argus 3.0.6 and dnaclusters

Hey Guys,
Did the patches make for a reliable argus under PF_DNA ?
I'll upload argus-3.0.7.2 with the changes as I have them.
I do need to make the change to add "dna" to the list of exceptional interfaces ?
Hope all is most excellent,

Carter

On Dec 13, 2012, at 8:22 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:

> (To enable debug in ARGUS - do a "touch .debug" in the source tree root
> before ./configure)
> 
> I don't have that problem with stopping and starting clients to
> pfdnacluster_master. In fact I leave it running non-stop on the live
> servers even though Suricat, Bro and ARGUS get restarted periodically.
> (I'm loth to restart the master in case it upsets the border switch as
> it did once!). Do you get the same problem if you use "pfcount" instead
> of tcpdump? Are you sure everything, tcpdump and ARGUS, are linked to
> the PF_FING pcap (it may even be best to remove any others)?
> 
> Best Wishes,
> Chris
> 
> On 14/12/12 01:15, Carter Bullard wrote:
>> Hey Craig,
>> Give argus a "-D 10" when you start it (assuming that you have debug compiled in).
>> That should tell us enough to know what is up.
>> 
>> Carter
>> 
>> 
>> On Dec 13, 2012, at 8:11 PM, Craig Merchant <cmerchant at responsys.com> wrote:
>> 
>>> So, I compiled 3.0.7 and made the changes to ArgusSource.c.
>>> 
>>> If I run it on eth0, I get the following (and CPU is low):
>>> 
>>> [root at ids01-dc1 bin]# argus -d -i eth0
>>> argus[16323]: 14 Dec 12 00:21:26.394339 started
>>> argus[16323]: 14 Dec 12 00:21:26.400452 ArgusGetInterfaceStatus: interface eth0 is up
>>> 
>>> I run Chris' modified version of pfdnacluster_master:
>>> 
>>> pfdnacluster_master -d -c 10 -r 0 -n 18 -m 0 -A 1 -i dna0
>>> 
>>> Snort sees traffic and so does tcpdump.  Although once a process has connected to a dnacluster:X at Y interface, stopping or killing the process makes that interface unavailable and pfdnacluster_master needs to be restarted.
>>> 
>>> If I run argus on dnacluster:10 at 18, I don't see the "interface X is up" message:
>>> 
>>> [root at ids01-dc1 bin]# ./argus -d -i dnacluster:10 at 18
>>> argus[8153]: 14 Dec 12 00:40:13.189119 started
>>> 
>>> The CPU runs at 100%.  ra -S 10.0.0.1:561 doesn't return any flows.
>>> 
>>> I ran Chris script and polls came back zero:
>>> 
>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9752-none.305
>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9768-none.306
>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9784-none.307
>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9800-none.308
>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9816-none.309
>>> 
>>> [root at ids01-dc1 ~]# ./check_script /proc/net/pf_ring/9816-none.309
>>> 2012-12-14 01:08:00 - Polls: 0, Polls/s: 0
>>> 2012-12-14 01:08:10 - Polls: 0, Polls/s: 0
>>> 2012-12-14 01:08:20 - Polls: 0, Polls/s: 0
>>> 
>>> Any ideas why Argus doesn't seem to be able to bring up the dnacluster interface?
>>> 
>>> Thanks!
>>> 
>>> Craig
>>> 
>>> -----Original Message-----
>>> From: Chris Wakelin [mailto:c.d.wakelin at reading.ac.uk] 
>>> Sent: Thursday, December 13, 2012 3:45 PM
>>> To: Carter Bullard
>>> Cc: Craig Merchant; Argus (argus-info at lists.andrew.cmu.edu)
>>> Subject: Re: [ARGUS] Argus 3.0.6 and dnaclusters
>>> 
>>> I think it *is* selectable. PF_RING keeps a num_poll_calls count per process, and it's managing 1.2m per second. What exactly it's counting, I'm not sure!
>>> 
>>> I've got a little shell script to track the rate:
>>> 
>>>> #!/bin/sh
>>>> POLLS=0
>>>> while true; do 
>>>> DATE=`date '+%Y-%m-%d %H:%M:%S'`
>>>> REPORT=`cat $@ | gawk -F":" '/^Num Poll Calls/{polls+=$2}END{print polls ","}'`
>>>> NPOLLS=${REPORT%%,*}
>>>> echo "$DATE - Polls: $NPOLLS, Polls/s: $((($NPOLLS-$POLLS)/10))"
>>>> POLLS=$NPOLLS
>>>> sleep 10
>>>> done
>>> 
>>> and used with something like "./pf_ring_polls.sh /proc/net/pf_ring/27702-none.41"
>>> 
>>> Craig, it would be interesting to know what you see?
>>> 
>>> Best Wishes,
>>> Chris
>>> 
>>> On 13/12/12 23:36, Carter Bullard wrote:
>>>> Hey Chris,
>>>> argus should be getting its packets using the routine 
>>>> ArgusGetPacket(), reading packets from a " notselectable " interface, which starts on line 3823 in ArgusSource.c.
>>>> 
>>>> So, argus should try to read 4 packets, using pcap_next_ex(), if its 
>>>> there,
>>>> pcap_dispatch() it its not, and if we don't get any packets (pkts == 
>>>> 0), then we're suppose to call nanosleep(), for 25 mSecs,  on line 3820.
>>>> 
>>>> Interesting that it never hits this call ?
>>>> 
>>>> Carter
>>>> 
>>>> 
>>>> On Dec 13, 2012, at 5:37 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
>>>> 
>>>>> Yes it's a bug in DNA. I can't remember seeing a commit that claimed 
>>>>> to fix it; the last I saw, I think on the topic was the developer's 
>>>>> reply to
>>>>> 
>>>>> http://listgateway.unipi.it/pipermail/ntop-misc/2012-September/003279
>>>>> .html
>>>>> 
>>>>> (and IPv6 is fine now BTW :-) )
>>>>> 
>>>>> As far as I remember, for some reason Bro IDS manages to use select() 
>>>>> without hitting the problem, I think, perhaps by adding empty 
>>>>> select() calls with a timeout:
>>>>> 
>>>>> From Bro's IOSource.cc:
>>>>> 
>>>>>>      if ( all_idle )
>>>>>>              {
>>>>>>              // Interesting: when all sources are dry, simply sleeping a
>>>>>>              // bit *without* watching for any fd becoming ready may
>>>>>>              // decrease CPU load. I guess that's because it allows
>>>>>>              // the kernel's packet buffers to fill. - Robin
>>>>>>              timeout.tv_sec = 0;
>>>>>>              timeout.tv_usec = 20; // SELECT_TIMEOUT;
>>>>>>              select(0, 0, 0, 0, &timeout);
>>>>>>              }
>>>>> 
>>>>> I had a go at doing that in ARGUS but it made no difference (perhaps 
>>>>> I put it in the wrong place!).
>>>>> 
>>>>> I'm happy to try things out on the test server, now I've updated 
>>>>> everything (I'm using tcpreplay of a 10GB pcap over a 1Gb link from 
>>>>> another machine using Intel e1000e cards and the time-limited DNA 
>>>>> demo licence, so I can only test for 5 mins at a time).
>>>>> 
>>>>> Best Wishes,
>>>>> Chris
>>>>> 
>>>>> On 13/12/12 22:11, Carter Bullard wrote:
>>>>>> If I remember, the 100% CPU was a bug in the DNA code itself?
>>>>>> Was there a resolution to that?
>>>>>> If you would be a guinea pig, we can play around with it?
>>>>>> 
>>>>>> Carter
>>>>>> 
>>>>>> 
>>>>>> On Dec 13, 2012, at 4:30 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
>>>>>> 
>>>>>>> I've just tried 3.0.7.2 with latest PF_RING svn (post v5.5.1) and 
>>>>>>> DNA clusters on a test machine. It looks like we do still need the 
>>>>>>> name change (added "dna" to the list of interfaces that includes 
>>>>>>> "dag" and
>>>>>>> "napa") and it still uses 100% of CPU, but otherwise appears to work.
>>>>>>> 
>>>>>>> Best Wishes,
>>>>>>> Chris
>>>>> 
>>>>> --
>>>>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>>>>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>>>>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>>>>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>> 
>> 
>> 
> 
> 
> -- 
> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
> 




More information about the argus mailing list