Argus 3.0.6 and dnaclusters

Chris Wakelin c.d.wakelin at reading.ac.uk
Tue Dec 18 14:13:04 EST 2012


On my test machine the extra nanosleep works at keeping CPU usage low,
but causes ARGUS to miss more packets at high speeds. I've tried
reducing the sleep time, but even 1 nanosecond (of course in practice it
probably takes longer than that to run the nanosleep call) causes losses.

I think you do need to add "dna" to exceptional interfaces or it will
fail to open it.

Best Wishes,
Chris

On 18/12/12 18:43, Craig Merchant wrote:
> So far it's been working great.  PF_RING 5.5.1 still has the bug that causes argus to run at 100%, but other than that it's been working great.
> 
> What's the best way to figure out if argus is failing to keep up with the volume of traffic on a particular interface?
> 
> Thx.
> 
> Craig
> 
> -----Original Message-----
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, December 17, 2012 7:21 PM
> To: Chris Wakelin
> Cc: Craig Merchant; Argus (argus-info at lists.andrew.cmu.edu)
> Subject: Re: [ARGUS] Argus 3.0.6 and dnaclusters
> 
> Hey Guys,
> Did the patches make for a reliable argus under PF_DNA ?
> I'll upload argus-3.0.7.2 with the changes as I have them.
> I do need to make the change to add "dna" to the list of exceptional interfaces ?
> Hope all is most excellent,
> 
> Carter
> 
> On Dec 13, 2012, at 8:22 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
> 
>> (To enable debug in ARGUS - do a "touch .debug" in the source tree root
>> before ./configure)
>>
>> I don't have that problem with stopping and starting clients to
>> pfdnacluster_master. In fact I leave it running non-stop on the live
>> servers even though Suricat, Bro and ARGUS get restarted periodically.
>> (I'm loth to restart the master in case it upsets the border switch as
>> it did once!). Do you get the same problem if you use "pfcount" instead
>> of tcpdump? Are you sure everything, tcpdump and ARGUS, are linked to
>> the PF_FING pcap (it may even be best to remove any others)?
>>
>> Best Wishes,
>> Chris
>>
>> On 14/12/12 01:15, Carter Bullard wrote:
>>> Hey Craig,
>>> Give argus a "-D 10" when you start it (assuming that you have debug compiled in).
>>> That should tell us enough to know what is up.
>>>
>>> Carter
>>>
>>>
>>> On Dec 13, 2012, at 8:11 PM, Craig Merchant <cmerchant at responsys.com> wrote:
>>>
>>>> So, I compiled 3.0.7 and made the changes to ArgusSource.c.
>>>>
>>>> If I run it on eth0, I get the following (and CPU is low):
>>>>
>>>> [root at ids01-dc1 bin]# argus -d -i eth0
>>>> argus[16323]: 14 Dec 12 00:21:26.394339 started
>>>> argus[16323]: 14 Dec 12 00:21:26.400452 ArgusGetInterfaceStatus: interface eth0 is up
>>>>
>>>> I run Chris' modified version of pfdnacluster_master:
>>>>
>>>> pfdnacluster_master -d -c 10 -r 0 -n 18 -m 0 -A 1 -i dna0
>>>>
>>>> Snort sees traffic and so does tcpdump.  Although once a process has connected to a dnacluster:X at Y interface, stopping or killing the process makes that interface unavailable and pfdnacluster_master needs to be restarted.
>>>>
>>>> If I run argus on dnacluster:10 at 18, I don't see the "interface X is up" message:
>>>>
>>>> [root at ids01-dc1 bin]# ./argus -d -i dnacluster:10 at 18
>>>> argus[8153]: 14 Dec 12 00:40:13.189119 started
>>>>
>>>> The CPU runs at 100%.  ra -S 10.0.0.1:561 doesn't return any flows.
>>>>
>>>> I ran Chris script and polls came back zero:
>>>>
>>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9752-none.305
>>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9768-none.306
>>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9784-none.307
>>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9800-none.308
>>>> -r--r--r-- 1 root root 0 Dec 14 01:07 9816-none.309
>>>>
>>>> [root at ids01-dc1 ~]# ./check_script /proc/net/pf_ring/9816-none.309
>>>> 2012-12-14 01:08:00 - Polls: 0, Polls/s: 0
>>>> 2012-12-14 01:08:10 - Polls: 0, Polls/s: 0
>>>> 2012-12-14 01:08:20 - Polls: 0, Polls/s: 0
>>>>
>>>> Any ideas why Argus doesn't seem to be able to bring up the dnacluster interface?
>>>>
>>>> Thanks!
>>>>
>>>> Craig
>>>>
>>>> -----Original Message-----
>>>> From: Chris Wakelin [mailto:c.d.wakelin at reading.ac.uk] 
>>>> Sent: Thursday, December 13, 2012 3:45 PM
>>>> To: Carter Bullard
>>>> Cc: Craig Merchant; Argus (argus-info at lists.andrew.cmu.edu)
>>>> Subject: Re: [ARGUS] Argus 3.0.6 and dnaclusters
>>>>
>>>> I think it *is* selectable. PF_RING keeps a num_poll_calls count per process, and it's managing 1.2m per second. What exactly it's counting, I'm not sure!
>>>>
>>>> I've got a little shell script to track the rate:
>>>>
>>>>> #!/bin/sh
>>>>> POLLS=0
>>>>> while true; do 
>>>>> DATE=`date '+%Y-%m-%d %H:%M:%S'`
>>>>> REPORT=`cat $@ | gawk -F":" '/^Num Poll Calls/{polls+=$2}END{print polls ","}'`
>>>>> NPOLLS=${REPORT%%,*}
>>>>> echo "$DATE - Polls: $NPOLLS, Polls/s: $((($NPOLLS-$POLLS)/10))"
>>>>> POLLS=$NPOLLS
>>>>> sleep 10
>>>>> done
>>>>
>>>> and used with something like "./pf_ring_polls.sh /proc/net/pf_ring/27702-none.41"
>>>>
>>>> Craig, it would be interesting to know what you see?
>>>>
>>>> Best Wishes,
>>>> Chris
>>>>
>>>> On 13/12/12 23:36, Carter Bullard wrote:
>>>>> Hey Chris,
>>>>> argus should be getting its packets using the routine 
>>>>> ArgusGetPacket(), reading packets from a " notselectable " interface, which starts on line 3823 in ArgusSource.c.
>>>>>
>>>>> So, argus should try to read 4 packets, using pcap_next_ex(), if its 
>>>>> there,
>>>>> pcap_dispatch() it its not, and if we don't get any packets (pkts == 
>>>>> 0), then we're suppose to call nanosleep(), for 25 mSecs,  on line 3820.
>>>>>
>>>>> Interesting that it never hits this call ?
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Dec 13, 2012, at 5:37 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
>>>>>
>>>>>> Yes it's a bug in DNA. I can't remember seeing a commit that claimed 
>>>>>> to fix it; the last I saw, I think on the topic was the developer's 
>>>>>> reply to
>>>>>>
>>>>>> http://listgateway.unipi.it/pipermail/ntop-misc/2012-September/003279
>>>>>> .html
>>>>>>
>>>>>> (and IPv6 is fine now BTW :-) )
>>>>>>
>>>>>> As far as I remember, for some reason Bro IDS manages to use select() 
>>>>>> without hitting the problem, I think, perhaps by adding empty 
>>>>>> select() calls with a timeout:
>>>>>>
>>>>>> From Bro's IOSource.cc:
>>>>>>
>>>>>>>      if ( all_idle )
>>>>>>>              {
>>>>>>>              // Interesting: when all sources are dry, simply sleeping a
>>>>>>>              // bit *without* watching for any fd becoming ready may
>>>>>>>              // decrease CPU load. I guess that's because it allows
>>>>>>>              // the kernel's packet buffers to fill. - Robin
>>>>>>>              timeout.tv_sec = 0;
>>>>>>>              timeout.tv_usec = 20; // SELECT_TIMEOUT;
>>>>>>>              select(0, 0, 0, 0, &timeout);
>>>>>>>              }
>>>>>>
>>>>>> I had a go at doing that in ARGUS but it made no difference (perhaps 
>>>>>> I put it in the wrong place!).
>>>>>>
>>>>>> I'm happy to try things out on the test server, now I've updated 
>>>>>> everything (I'm using tcpreplay of a 10GB pcap over a 1Gb link from 
>>>>>> another machine using Intel e1000e cards and the time-limited DNA 
>>>>>> demo licence, so I can only test for 5 mins at a time).
>>>>>>
>>>>>> Best Wishes,
>>>>>> Chris
>>>>>>
>>>>>> On 13/12/12 22:11, Carter Bullard wrote:
>>>>>>> If I remember, the 100% CPU was a bug in the DNA code itself?
>>>>>>> Was there a resolution to that?
>>>>>>> If you would be a guinea pig, we can play around with it?
>>>>>>>
>>>>>>> Carter
>>>>>>>
>>>>>>>
>>>>>>> On Dec 13, 2012, at 4:30 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
>>>>>>>
>>>>>>>> I've just tried 3.0.7.2 with latest PF_RING svn (post v5.5.1) and 
>>>>>>>> DNA clusters on a test machine. It looks like we do still need the 
>>>>>>>> name change (added "dna" to the list of interfaces that includes 
>>>>>>>> "dag" and
>>>>>>>> "napa") and it still uses 100% of CPU, but otherwise appears to work.
>>>>>>>>
>>>>>>>> Best Wishes,
>>>>>>>> Chris
>>>>>>
>>>>>> --
>>>>>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>>>>>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>>>>>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>>>>>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>>>>>
>>>>>
>>>>
>>>>
>>>> -- 
>>>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>>>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>>>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>>>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>>>
>>>
>>>
>>
>>
>> -- 
>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>
> 


-- 
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK              Fax: +44 (0)118 975 3094



More information about the argus mailing list