Problem with argus under load not reopening output file.

Carter Bullard carter at qosient.com
Sun Jan 11 11:42:04 EST 2009


Hey Martijn,
If you don't mind, I'd like to keep the thread in the argus mailing  
list.

Sounds like an ntp or a rdate() issue, where the machine's time is
getting modified to a bogus value.  This is of course an old way
of attacking a machine ;o)

Argus can attempt to catch jumps in time by testing the timestamp
in the packets against the systems global time, or a delta of the
two.  But, how to respond to the jump is a problem, because
we don't know which system (packet capture/host os) is correct.

Carter

On Jan 11, 2009, at 8:17 AM, Martijn van Oosterhout wrote:

> On Sat, Jan 10, 2009 at 5:01 PM, Carter Bullard <carter at qosient.com>  
> wrote:
>> Hey Martijn,
>> Sorry for the delayed response.  I've gone over the code a bit, and  
>> I don't
>> see how this jump can occur.   But that is the nature of bugs,  
>> sometimes.
>>
>> There are a few things that I need to proceed.  What platform, OS,  
>> 64-bit?
>> What are we connecting to.  1Gbps?  10Gbps?   And, what is the  
>> processor
>> load for argus?  And the interrupt rate, any since as to how many  
>> packets
>> per second?
>
> The load is about 30%, IIRC. The line is 200-250Mbs, so high but not
> exceptional. In the meantime there has arisen another theory: namely
> that under certain circumstances the Linux kernel version we're
> running (2.6.20) can for a very short time return a time 1 hour 13
> minutes in the future. If argus happens to pick that one as
> GlobalTime, it will cause the symptoms seen.
>
> As these things go, as soon as you try to verify this is the problem,
> the problem stops happening... I hope monday I'll have evidence one
> way or the other.
>
>> When this occurs, are we getting any packets at all? (your strace  
>> should
>> have
>> packet reading, since we do a select() )
>
> It's using the linux PCAP ringbuffer, so there are no kernel calls for
> fetching the packets, only writes.
>
>> My suspicion is that if all is as it should be, but we all of a  
>> sudden get a
>> leap in our  global time, argus maybe so loaded that it is getting  
>> behind.
>> But, that is a very preliminary guess.
>
> That sounds like the most plausible idea so far. It's possible to work
> around this is argus, the question is should we. In ArgusWrite* if you
> change the test with lastwritten to be not-equal (!=) rather than
> less-than (<) the issue shouldn't arise either.
>
> I hope to have more info soon.
>
> Have a nice day,
> -- 
> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax






More information about the argus mailing list