argus fails to allocate memory...

Tue Apr 14 10:38:15 EDT 2009

Hey Michael and Russell,
Sorry this patch isn't 'out' yet, since we found out it was a kernel  
bug, I was
hoping it would just go away ;o)

Yes, the system clock goes waaayyyy into the future for a brief  
moment, we
reject packets that come in waaayyy out of time (thus the error  
messages)
but, the bad system times cause us to insert flows into the FIFO flow  
cache
queue, marked with bad insertion times.   These records finally become  
the
"last" record in the flow status FIFO queue, and the algorithm waits  
until
the queue insertion timestamp plus the flow status interval is "<" the  
current
time (timeout).

A flow record in the future, will not be processed until we finally  
get to the future.
And with a flow record in the future pluging up the pipeline, we'll  
just start piling
up the flow records.

The strategy for the fix is to test the queue insertion timestamps  
when a
record gets to the end of the FIFO status queue.  We check to see if  
its queue
insertion time is currently in the future.   If so, we'll go ahead and  
write the record
with status "ARGUS_ERROR", delete it and move on.

The idea is you should get the flow activity reported, but with bad  
timestamps
out of the argus().   These records will be marked as bad, so you can  
pick
them out of the stream (ra filter keyword 'error')?

I have this fix in argus-3.0.2.beta.2, but have had no time to test.   
Give this
patch a try to see if it helps.

Carter

==== //depot/argus/argus/argus/ArgusModeler.c#64 - /home/carter/argus/ 
argus/argus/ArgusModeler.c ====
1267c1267,1271
<                   if (ArgusCheckTimeout(model, &flow->qhdr.qtime,  
getArgusFarReportInterval(model))) {
---
 >
 >                   struct timeval now = model->ArgusGlobalTime;
 >                   if ((flow->qhdr.qtime.tv_sec  > now.tv_sec) ||
 >                      ((flow->qhdr.qtime.tv_sec == now.tv_sec) &&
 >                       (flow->qhdr.qtime.tv_usec > now.tv_usec))) {
1270,1272c1274,1283
<                         ArgusSendFlowRecord (model, flow,  
ARGUS_STATUS);
<                      ArgusPushQueue(model->ArgusStatusQueue, &flow- 
 >qhdr, ARGUS_LOCK);
<                      queue->reclaim++;
---
 >                         ArgusSendFlowRecord (model, flow,  
ARGUS_ERROR);
 >                      ArgusDeleteObject (flow);
 >                   } else {
 >                      if (ArgusCheckTimeout(model, &flow- 
 >qhdr.qtime, getArgusFarReportInterval(model))) {
 >                         ArgusRemoveFromQueue(queue, &flow->qhdr,  
ARGUS_LOCK);
 >                         if (!(flow->status & ARGUS_RECORD_WRITTEN))
 >                            ArgusSendFlowRecord (model, flow,  
ARGUS_STATUS);
 >                         ArgusPushQueue(model->ArgusStatusQueue,  
&flow->qhdr, ARGUS_LOCK);
 >                         queue->reclaim++;
 >                      }

On Apr 14, 2009, at 4:00 AM, Michael Sanderson wrote:

> Hi Russell, I have seen this behaviour with 4.4 (and 4.3)  (Jan 26,  
> 09 posting to the list).  Argus appears to be getting packets with  
> time stamps that are significantly out to lunch.  The daemon then  
> selectively pushes out flow data and accumulates lots of data in  
> memory.  If you kill -HUP it, it will happily flush all the flows  
> out and exit.  In my experience, looking at that data will show one  
> flow with a time stamp that makes no sense.  I've implemented some  
> things to watch the daemon and kill -HUP/restart when it gets "too  
> big".  I haven't found the time to tcpdump the system and deal with  
> rotating those logs so that I don't run out of space before I catch  
> the packet and find out if this is an OBSD problem or some weird  
> argus problem.
>
> I believe that a similar problem had been seen by Martijn van  
> Oosterhout back in Dec 08/Jan 09, but on Linux.  The root cause  
> there appears to have been a Linux kernel bug.
>
>    Michael Sanderson
>
> Russell Fulton wrote:
>> Hi Folks,
>> I'm afraid I've been totally snowed under for that last 18 months  
>> and not really following what been happening in the argus world. I  
>> *have* registered that things have happened that I need to catch up  
>> on.  Carter has clearly not been idle :)
>> This post is prompted by problems I am having on a new OBSD 4.4  
>> sensor.  The box has 1GB of memory, it is running snort and pf  
>> (part of the kernel) and without argus running shows free memory at  
>> around 600MB.  After starting argus and letting it run for a while  
>> free memory drops to 590MB and top shows argus using 10MB.  I will  
>> keep an eye on this now I have some idea about what is happening  
>> and post again if behaviour changes.
>> I start argus and it runs fine for a few hours and then stops.  I  
>> could not find any errors in the logs so I removed daemon from the  
>> config and started it sending both stdout and stderr to a file.   
>> Here is the tail of the file after it stopped:
>> -bash-3.2$ head argus.out
>>  ArgusWarning: argus[15833]: 11 Apr 09 20:34:51.271087 started
>>  ArgusWarning: argus[15833]: 11 Apr 09 20:34:51.271471  
>> ArgusGetInterfaceStatus: interface em1 is up
>>  ArgusWarning: argus[15833]: 11 Apr 09 23:58:39.068381  
>> ArgusInterface timestamps wayyy out of order: now -293216183 then  
>> 1239451118
>>  ArgusWarning: argus[15833]: 12 Apr 09 00:48:28.318861  
>> ArgusInterface timestamps wayyy out of order: now -1668161463 then  
>> 1239454108
>>  ArgusWarning: argus[15833]: 12 Apr 09 00:48:28.602530  
>> ArgusInterface timestamps wayyy out of order: now -1668161463 then  
>> 1239454108
>>  ArgusWarning: argus[15833]: 12 Apr 09 00:48:28.602602  
>> ArgusInterface timestamps wayyy out of order: now -1668161463 then  
>> 1239454108
>>  ArgusWarning: argus[15833]: 12 Apr 09 07:33:24.759899  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>> -bash-3.2$ tail argus.out
>> ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.850584  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.850661  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.850753  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.850820  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.850886  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.850951  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.851016  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.851099  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>  ArgusWarning: argus[15833]: 12 Apr 09 09:58:19.851168  
>> ArgusNewFlow() ArgusMalloc error Cannot allocate memory.
>>    ArgusError: argus[15833]: 12 Apr 09 09:58:19.851195  
>> ArgusNewFragFlow() returned NULL.
>> -bash-3.2$ grep ArgusNewFlow argus.out | wc -l
>> 12,058,190
>> server version 3.0.0
>> The sensor is on our resnet firewall so there is all sorts of crap  
>> on the network currently we are in the middle of a break so the  
>> load is not that high and argus will stay running for up to 12  
>> hours.  When the halls are filled it runs for two to three hours.
>> Anyone have any idea what is going on?
>> Russell
>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax