Argus 3.0.6 and dnaclusters

Chris Wakelin c.d.wakelin at reading.ac.uk
Thu Dec 13 20:22:23 EST 2012


(To enable debug in ARGUS - do a "touch .debug" in the source tree root
before ./configure)

I don't have that problem with stopping and starting clients to
pfdnacluster_master. In fact I leave it running non-stop on the live
servers even though Suricat, Bro and ARGUS get restarted periodically.
(I'm loth to restart the master in case it upsets the border switch as
it did once!). Do you get the same problem if you use "pfcount" instead
of tcpdump? Are you sure everything, tcpdump and ARGUS, are linked to
the PF_FING pcap (it may even be best to remove any others)?

Best Wishes,
Chris

On 14/12/12 01:15, Carter Bullard wrote:
> Hey Craig,
> Give argus a "-D 10" when you start it (assuming that you have debug compiled in).
> That should tell us enough to know what is up.
> 
> Carter
> 
> 
> On Dec 13, 2012, at 8:11 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
>> So, I compiled 3.0.7 and made the changes to ArgusSource.c.
>>
>> If I run it on eth0, I get the following (and CPU is low):
>>
>> [root at ids01-dc1 bin]# argus -d -i eth0
>> argus[16323]: 14 Dec 12 00:21:26.394339 started
>> argus[16323]: 14 Dec 12 00:21:26.400452 ArgusGetInterfaceStatus: interface eth0 is up
>>
>> I run Chris' modified version of pfdnacluster_master:
>>
>> pfdnacluster_master -d -c 10 -r 0 -n 18 -m 0 -A 1 -i dna0
>>
>> Snort sees traffic and so does tcpdump.  Although once a process has connected to a dnacluster:X at Y interface, stopping or killing the process makes that interface unavailable and pfdnacluster_master needs to be restarted.
>>
>> If I run argus on dnacluster:10 at 18, I don't see the "interface X is up" message:
>>
>> [root at ids01-dc1 bin]# ./argus -d -i dnacluster:10 at 18
>> argus[8153]: 14 Dec 12 00:40:13.189119 started
>>
>> The CPU runs at 100%.  ra -S 10.0.0.1:561 doesn't return any flows.
>>
>> I ran Chris script and polls came back zero:
>>
>> -r--r--r-- 1 root root 0 Dec 14 01:07 9752-none.305
>> -r--r--r-- 1 root root 0 Dec 14 01:07 9768-none.306
>> -r--r--r-- 1 root root 0 Dec 14 01:07 9784-none.307
>> -r--r--r-- 1 root root 0 Dec 14 01:07 9800-none.308
>> -r--r--r-- 1 root root 0 Dec 14 01:07 9816-none.309
>>
>> [root at ids01-dc1 ~]# ./check_script /proc/net/pf_ring/9816-none.309
>> 2012-12-14 01:08:00 - Polls: 0, Polls/s: 0
>> 2012-12-14 01:08:10 - Polls: 0, Polls/s: 0
>> 2012-12-14 01:08:20 - Polls: 0, Polls/s: 0
>>
>> Any ideas why Argus doesn't seem to be able to bring up the dnacluster interface?
>>
>> Thanks!
>>
>> Craig
>>
>> -----Original Message-----
>> From: Chris Wakelin [mailto:c.d.wakelin at reading.ac.uk] 
>> Sent: Thursday, December 13, 2012 3:45 PM
>> To: Carter Bullard
>> Cc: Craig Merchant; Argus (argus-info at lists.andrew.cmu.edu)
>> Subject: Re: [ARGUS] Argus 3.0.6 and dnaclusters
>>
>> I think it *is* selectable. PF_RING keeps a num_poll_calls count per process, and it's managing 1.2m per second. What exactly it's counting, I'm not sure!
>>
>> I've got a little shell script to track the rate:
>>
>>> #!/bin/sh
>>> POLLS=0
>>> while true; do 
>>>  DATE=`date '+%Y-%m-%d %H:%M:%S'`
>>>  REPORT=`cat $@ | gawk -F":" '/^Num Poll Calls/{polls+=$2}END{print polls ","}'`
>>>  NPOLLS=${REPORT%%,*}
>>>  echo "$DATE - Polls: $NPOLLS, Polls/s: $((($NPOLLS-$POLLS)/10))"
>>>  POLLS=$NPOLLS
>>>  sleep 10
>>> done
>>
>> and used with something like "./pf_ring_polls.sh /proc/net/pf_ring/27702-none.41"
>>
>> Craig, it would be interesting to know what you see?
>>
>> Best Wishes,
>> Chris
>>
>> On 13/12/12 23:36, Carter Bullard wrote:
>>> Hey Chris,
>>> argus should be getting its packets using the routine 
>>> ArgusGetPacket(), reading packets from a " notselectable " interface, which starts on line 3823 in ArgusSource.c.
>>>
>>> So, argus should try to read 4 packets, using pcap_next_ex(), if its 
>>> there,
>>> pcap_dispatch() it its not, and if we don't get any packets (pkts == 
>>> 0), then we're suppose to call nanosleep(), for 25 mSecs,  on line 3820.
>>>
>>> Interesting that it never hits this call ?
>>>
>>> Carter
>>>
>>>
>>> On Dec 13, 2012, at 5:37 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
>>>
>>>> Yes it's a bug in DNA. I can't remember seeing a commit that claimed 
>>>> to fix it; the last I saw, I think on the topic was the developer's 
>>>> reply to
>>>>
>>>> http://listgateway.unipi.it/pipermail/ntop-misc/2012-September/003279
>>>> .html
>>>>
>>>> (and IPv6 is fine now BTW :-) )
>>>>
>>>> As far as I remember, for some reason Bro IDS manages to use select() 
>>>> without hitting the problem, I think, perhaps by adding empty 
>>>> select() calls with a timeout:
>>>>
>>>> From Bro's IOSource.cc:
>>>>
>>>>>       if ( all_idle )
>>>>>               {
>>>>>               // Interesting: when all sources are dry, simply sleeping a
>>>>>               // bit *without* watching for any fd becoming ready may
>>>>>               // decrease CPU load. I guess that's because it allows
>>>>>               // the kernel's packet buffers to fill. - Robin
>>>>>               timeout.tv_sec = 0;
>>>>>               timeout.tv_usec = 20; // SELECT_TIMEOUT;
>>>>>               select(0, 0, 0, 0, &timeout);
>>>>>               }
>>>>
>>>> I had a go at doing that in ARGUS but it made no difference (perhaps 
>>>> I put it in the wrong place!).
>>>>
>>>> I'm happy to try things out on the test server, now I've updated 
>>>> everything (I'm using tcpreplay of a 10GB pcap over a 1Gb link from 
>>>> another machine using Intel e1000e cards and the time-limited DNA 
>>>> demo licence, so I can only test for 5 mins at a time).
>>>>
>>>> Best Wishes,
>>>> Chris
>>>>
>>>> On 13/12/12 22:11, Carter Bullard wrote:
>>>>> If I remember, the 100% CPU was a bug in the DNA code itself?
>>>>> Was there a resolution to that?
>>>>> If you would be a guinea pig, we can play around with it?
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Dec 13, 2012, at 4:30 PM, Chris Wakelin <c.d.wakelin at reading.ac.uk> wrote:
>>>>>
>>>>>> I've just tried 3.0.7.2 with latest PF_RING svn (post v5.5.1) and 
>>>>>> DNA clusters on a test machine. It looks like we do still need the 
>>>>>> name change (added "dna" to the list of interfaces that includes 
>>>>>> "dag" and
>>>>>> "napa") and it still uses 100% of CPU, but otherwise appears to work.
>>>>>>
>>>>>> Best Wishes,
>>>>>> Chris
>>>>
>>>> --
>>>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>>>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>>>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>>>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>>>
>>>
>>
>>
>> -- 
>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
>> Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
>> IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
>> Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094
>>
> 
> 


-- 
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin,                           c.d.wakelin at reading.ac.uk
IT Services Centre, The University of Reading,  Tel: +44 (0)118 378 8439
Whiteknights, Reading, RG6 2AF, UK              Fax: +44 (0)118 975 3094



More information about the argus mailing list