Endance DAG 8.1 and Argus problem

Leif Tishendorf ltishend at gmail.com
Thu Feb 24 17:51:02 EST 2011


Hey Carter,

At the time, I had my 6 instances of Argus running with Radium 
connecting to them to aggregate the interfaces and then rasplit 
listening to Radium to log the data.  After the error I noticed rasplit 
was no longer logging data to disk.  So to try and figure out what was 
going on I attempted to read from the Radium interface direct using ra 
and ratop and was getting no data out.  I shut down Radium and tried 
reading from each Argus instance directly with the same results.  I 
restarted the instances of Argus and tried the procedure above again 
with the same results.  Starting up new instances of Argus 3.0.2 and 
3.0.3.22 yielded the same results.  Lastly I restarted the box and was 
again able to get data out of Argus. No other functions, or programs 
using the Dag card, were affected like this.  Somewhere between the data 
queue and getting the data out of the queue, there was a disconnect.

By the way, thanks again for all the continued help on this.

-Leif

On 02/24/2011 02:27 PM, Carter Bullard wrote:
> Hey Leif,
> Sorry for the delay.  argus is getting packets, it is processing flow data and it's queueing data to the oupt sockets.  The error messages indicate that each output queue has 200,000 flow records ready to be written out, but no one appears to be reading the data.
>
> The queues getting long and complaining is usually because your reader cannot keep up with the output stream.  It thinks that no one is reading and so it closes the output sockets.  The segfault is a real problem, and I'll try to figure that out tonight.
>
> Not sure that I understand the current scenario.  You say it is not writing anything but you are getting queue exceed errors?
>
> Carter
>
>
> On Feb 23, 2011, at 6:54 PM, Leif Tishendorf<ltishend at gmail.com>  wrote:
>
>> Carter,
>>
>> So I ran into an interesting problem this morning.  I ran ratop against the new patched 3.0.3.22 for testing and after a couple minutes it crashed.  I didn't look any more into it at the time because I was at Jury Judy.  Now that I've had some time to do some back checking it would appear the new argus caused a kernel error, and then shortly after all 6 running instances of argus 3.0.2 threw the following error in short successtion.
>>
>> ========================================
>> Feb 23 08:32:40 goldfinger kernel: [2840126.624223] argus[25584]: segfault at 188 ip 00007f58cf3ccf7c sp 00007f58bdf80718 error 6 in libc-2.12.1.so[7f58cf346000+17a000]
>> Feb 23 08:32:55 goldfinger argus[17777]: 23 Feb 11 08:32:55.413984 ArgusWriteOutSocket(0xd6fa3010) maximum errors exceeded 200000
>> Feb 23 08:32:55 goldfinger argus[17777]: 23 Feb 11 08:32:55.414030 ArgusWriteOutSocket(0xd6fa3010) maximum errors exceeded 200001
>> Feb 23 08:32:59 goldfinger argus[30178]: 23 Feb 11 08:32:59.464853 ArgusWriteOutSocket(0xc2016010) maximum errors exceeded 200000
>> Feb 23 08:32:59 goldfinger argus[30178]: 23 Feb 11 08:32:59.464893 ArgusWriteOutSocket(0xc2016010) maximum errors exceeded 200001
>> Feb 23 08:32:59 goldfinger argus[16029]: 23 Feb 11 08:32:59.673235 ArgusWriteOutSocket(0x80fda010) maximum errors exceeded 200000
>> Feb 23 08:32:59 goldfinger argus[16029]: 23 Feb 11 08:32:59.673276 ArgusWriteOutSocket(0x80fda010) maximum errors exceeded 200001
>> Feb 23 08:33:00 goldfinger argus[29048]: 23 Feb 11 08:33:00.290119 ArgusWriteOutSocket(0xda789010) maximum errors exceeded 200000
>> Feb 23 08:33:00 goldfinger argus[29048]: 23 Feb 11 08:33:00.290158 ArgusWriteOutSocket(0xda789010) maximum errors exceeded 200001
>> Feb 23 08:33:00 goldfinger argus[12721]: 23 Feb 11 08:33:00.683134 ArgusWriteOutSocket(0x5db96010) maximum errors exceeded 200000
>> Feb 23 08:33:00 goldfinger argus[12721]: 23 Feb 11 08:33:00.683165 ArgusWriteOutSocket(0x5db96010) maximum errors exceeded 200001
>> Feb 23 08:33:00 goldfinger argus[13677]: 23 Feb 11 08:33:00.925746 ArgusWriteOutSocket(0xc37ba010) maximum errors exceeded 200000
>> Feb 23 08:33:00 goldfinger argus[13677]: 23 Feb 11 08:33:00.925785 ArgusWriteOutSocket(0xc37ba010) maximum errors exceeded 200001
>> ===============================================
>>
>> I have scripts in place to deal with the crashing and timestamp fits of argus 3.0.2 and so everything kept going, but I just noticed it is no longer actually writing data to the interfaces.  Restarting them doesn't change it and 3.0.3.22 is behaving the same way.  I also have continued to get the 'ArgusWriteOutSocket(0xd6fa3010) maximum errors exceeded' errors since, which I hadn't gotten before.
>>
>> I've double checked with tcpdump to make sure I'm still actually sending out data on the argus DAG interface data streams.
>>
>> Thanks,
>>
>> Leif
>>
>> On 02/21/2011 02:29 PM, Carter Bullard wrote:
>>> Hey Leif,
>>> I may have found the bug.  We have new compile directives, "HAVE_DAG", that we use when you are
>>> using the dag drivers, rather than using the libpcap interface.  The bug caused us to use the dag specific
>>> open routines, even though they were not linked in, which caused us to not try to open the interface at all.
>>>
>>> The fix is simple, but if you don't use patch() very often, it may be messy.  I've included a new ArgusSource.c,
>>> which you should copy over ./argus/ArgusSource.c.  I've included the patch, for those that like patch files.
>>> If this doesn't help you out, rerun argus using gdb(), and send the output again.
>>>
>>> Thanks, and sorry for the inconvenience,
>>>
>>> Carter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---- included patch ----
>>> ==== //depot/argus/argus/argus/ArgusSource.c#81 - /Users/carter/argus/argus/argus/ArgusSource.c ====
>>> 295a296
>>>> #ifdef HAVE_DAG
>>> 296a298
>>>> #endif
>>> 347a350
>>>> #ifdef HAVE_DAG
>>> 349d351
>>> <   #ifdef HAVE_DAG
>>> 368a371
>>>>     }
>>> 370d372
>>> <      }
>>> ---- end patch ----
>>>
>>> On Feb 18, 2011, at 12:27 PM, Leif Tishendorf wrote:
>>>
>>>> Carter,
>>>>
>>>>> Sorry, you need to run without the DAEMON mode on.  Also add a -D1 just to verify that there is some activity.
>>>>> So try:
>>>>>
>>>>>      run -D1 -d
>>>>
>>>> Ah, ok, did that and here's the output now
>>>>
>>>> ----
>>>> Reading symbols from /root/argus-3.0.3.22/bin/argus...done.
>>>> (gdb) run -D1 -d
>>>> Starting program: /root/argus-3.0.3.22/bin/argus -D1 -d
>>>> [Thread debugging using libthread_db enabled]
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:44.777281 ArgusNewModeler() returning 0x671010
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:44.777427 ArgusNewOutput() returning retn 0x671d20
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:44.782451 setArgusID(0x7ffff690f040, 0xac16057b) done
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:44.782472 setArgusID(0x7ffff690f040, 0xac16057b) done
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:44.782478 setArgusID(0x7ffff690f040, 0xac16057b) done
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:44.782503 ArgusParseResourceFile: ArgusBindAddr "(null)"
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.990235 ArgusParseResourceFile (/etc/argus.conf) returning
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.990277 setArgusInterfaceStatus(0x7ffff690f010, 1)
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.991267 ArgusEstablishListen(0x671d20, 0x7fffffffd090) binding: 172.22.5.123:568 family: 2
>>>> [New Thread 0x7ffff5f61700 (LWP 26405)]
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.992196 ArgusInitOutput() done
>>>> argus[26368]: 18 Feb 11 09:19:46.992222 started
>>>> argus[26368.0017f6f5ff7f0000]: 18 Feb 11 09:19:46.992246 ArgusOutputProcess(0x671d20) starting
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.994594 ArgusOpenInterface(0x7ffff5356010, 'dag0:36') returning 0
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.994606 ArgusInitSource: no packet sources for this device.
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:46.994611 ArgusInitSource(0x7ffff5356010) returning 0
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:47.994704 main() ArgusSourceProcess returned: shuting down
>>>>
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:47.994747 ArgusShutDown(Normal Shutdown)
>>>>
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:47.994756 ArgusCloseSource(0x7ffff690f010) starting
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:47.994775 ArgusCloseEvents() done
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:47.994783 ArgusCloseOutput(0x671d20) scheduling closure after 0 records
>>>> argus[26368.0017f6f5ff7f0000]: 18 Feb 11 09:19:48.093424 ArgusOutputProcess(0x671d20) exiting
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:48.100631 ArgusCloseOutput(0x671d20) done
>>>> [Thread 0x7ffff5f61700 (LWP 26405) exited]
>>>> argus[26368.00d7fef7ff7f0000]: 18 Feb 11 09:19:48.100705 ArgusShutDown()
>>>>
>>>> Program exited normally.
>>>> ----
>>>>
>>>> Thanks,
>>>>
>>>> --Leif
>>>>
>>>> On 02/18/2011 06:31 AM, Carter Bullard wrote:
>>>>> Hey Leif,
>>>>> Sorry, you need to run without the DAEMON mode on.  Also add a -D1 just to verify that there is some activity.
>>>>> So try:
>>>>>
>>>>>     run -D1 -d
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Feb 17, 2011, at 4:32 PM, Leif Tishendorf wrote:
>>>>>
>>>>>> Carter,
>>>>>>
>>>>>> Here is the output from gdb:
>>>>>>
>>>>>> ----
>>>>>> Starting program: /root/argus-3.0.3.22/bin/argus -F ../support/Config/argus.conf
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> [New Thread 0x7ffff5f61700 (LWP 25329)]
>>>>>> argus[25294]: 17 Feb 11 10:50:06.455798 started
>>>>>> [Thread 0x7ffff5f61700 (LWP 25329) exited]
>>>>>>
>>>>>> Program exited normally.
>>>>>> ----
>>>>>>
>>>>>> Though I've never run anything through gdb before so that's just a straight run command.  If there is more you'd like me to do just let me know.
>>>>>>
>>>>>> Also, in the debug output I was wondering about the line:
>>>>>>
>>>>>> ----
>>>>>> argus[13042.00172305347f0000]: 16 Feb 11 11:59:48.506743 ArgusOpenInterface(0x7f3402599010, 'dag0:62') returning 0
>>>>>> ----
>>>>>>
>>>>>> Is Argus not finding the dag interface?
>>>>>>
>>>>>> --Leif
>>>>>>
>>>>>>
>>>>>> On 02/17/2011 04:17 AM, Carter Bullard wrote:
>>>>>>> Hey Leif,
>>>>>>> I suspect that your packet source thread is crashing, and the rest of the argus is doing it's thing.  Run argus under gdb to see if tells you more about the problem.
>>>>>>>
>>>>>>> To compile with symbols, create the development tag and reconfigure and remake:
>>>>>>>     % touch .devel
>>>>>>>     % ./configure
>>>>>>>     % make clean
>>>>>>>     % make
>>>>>>>     % gdb ./bin/argus
>>>>>>>
>>>>>>> Be sure to run without daemon mode.
>>>>>>> Carter
>>>>>>>
>>>>>>>
>>>>>>> On Feb 15, 2011, at 5:21 PM, Leif Tishendorf<ltishend at gmail.com>     wrote:
>>>>>>>
>>>>>>>> Carter,
>>>>>>>>
>>>>>>>> I should probably start a different thread for this but it's the same system as the 3.0.3.22 issue and didn't want to clutter things up too much.  I just recently installed 3.0.2 on this same box, and originally I thought it was functioning normally. However, after more testing I've noticed there are a couple issues and was wondering if you had any suggestions.
>>>>>>>>
>>>>>>>> 1.  I have 6 load balanced streams to break up the traffic on a Dag 8.1 card and an argus process on each.  Over time the argus processes will exit without error.
>>>>>>>>
>>>>>>>> 2.  Time stamps over time get exteremely skewed (like it starts out puting year ranges from 1912 to 2057).  This seems to be worse with higher load.  Currently each process is running at about 20% CPU or less (8 core, 16 hyper-threaded).  I have Snort, nTop and tcpdump running on other streams and they don't experience the time skew issue.
>>>>>>>>
>>>>>>>> Ideally I'd rather be using the 3.0.3.22(3.0.4 when it's released) to take advantage of it's multiple interface handling and multi-core support and not do over much trouble shooting on an older code base. Anything I can test/try, information I can provide I'd be happy to do so.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> --Leif
>>>>>>>>
>>>>>>>> On 02/14/2011 12:31 PM, Carter Bullard wrote:
>>>>>>>>> Hey Leif,
>>>>>>>>> It could be a bug.  Argus has run on many versions of the dag, but I don't test
>>>>>>>>> each dev release against dag's as I don't have access any longer.
>>>>>>>>>
>>>>>>>>> The easiest test is to make sure tcpdump gets packets from that interface.  If
>>>>>>>>> so, then running argus with the "-D debugLevel" option will give us some detail
>>>>>>>>> printing on what is happening.
>>>>>>>>>
>>>>>>>>> Try with "-D 6" to start, and if that doesn't help, increase to get more info, and don't run
>>>>>>>>> in daemon mode.
>>>>>>>>>
>>>>>>>>> Be sure and put the "-D 6" as the first option, so you get debug printing for parsing the
>>>>>>>>> command line options, etc......
>>>>>>>>>
>>>>>>>>> To compile debug support into argus, in the argus distribution directory:
>>>>>>>>>     % touch .debug
>>>>>>>>>     % ./configure
>>>>>>>>>     % make clean
>>>>>>>>>     % make
>>>>>>>>>
>>>>>>>>> Carter
>>>>>>>>>
>>>>>>>>> On Feb 14, 2011, at 3:15 PM, Leif Tishendorf wrote:
>>>>>>>>>
>>>>>>>>>> Hello all,
>>>>>>>>>>
>>>>>>>>>> I'm running an Endance Dag 8.1 card and I'm having difficulty getting Argus to work with it.  I've compiled Argus 3.0.3.22 against the Dag enabled libpcap files and Argus will run if I set it to eth0, which is the management interface, but if I set it to a dag stream, e.g. ARGUS_INTERFACE=dag0:8, the daemon says it starts, and prints to syslog that it starts, but it doesn't actually start.
>>>>>>>>>>
>>>>>>>>>> I was wondering if anyone may have had a similar issue and be able to offer some pointers.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> --Leif
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --Leif
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Leif
>>>>>>
>>>>>
>>>>
>>>> --
>>>> --Leif
>>>>
>>>
>>
>> --
>> --Leif
>>

-- 
--Leif



More information about the argus mailing list