Problem with argus under load not reopening output file.

Martijn van Oosterhout kleptog at gmail.com
Sun Jan 11 12:56:40 EST 2009


Sorry, hit the wrong button. No, its actually a kernel bug:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d8bb6f4c1670c8324e4135c61ef07486f7f17379

Causes the time as read by gettimeday() to jump an hour in the future
and back for a packet or two.

At least, I hope it's this, otherwise I don't know.

Have a nice day,

On Sun, Jan 11, 2009 at 5:42 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Martijn,
> If you don't mind, I'd like to keep the thread in the argus mailing list.
>
> Sounds like an ntp or a rdate() issue, where the machine's time is
> getting modified to a bogus value.  This is of course an old way
> of attacking a machine ;o)
>
> Argus can attempt to catch jumps in time by testing the timestamp
> in the packets against the systems global time, or a delta of the
> two.  But, how to respond to the jump is a problem, because
> we don't know which system (packet capture/host os) is correct.
>
> Carter
>
> On Jan 11, 2009, at 8:17 AM, Martijn van Oosterhout wrote:
>
>> On Sat, Jan 10, 2009 at 5:01 PM, Carter Bullard <carter at qosient.com>
>> wrote:
>>>
>>> Hey Martijn,
>>> Sorry for the delayed response.  I've gone over the code a bit, and I
>>> don't
>>> see how this jump can occur.   But that is the nature of bugs, sometimes.
>>>
>>> There are a few things that I need to proceed.  What platform, OS,
>>> 64-bit?
>>> What are we connecting to.  1Gbps?  10Gbps?   And, what is the processor
>>> load for argus?  And the interrupt rate, any since as to how many packets
>>> per second?
>>
>> The load is about 30%, IIRC. The line is 200-250Mbs, so high but not
>> exceptional. In the meantime there has arisen another theory: namely
>> that under certain circumstances the Linux kernel version we're
>> running (2.6.20) can for a very short time return a time 1 hour 13
>> minutes in the future. If argus happens to pick that one as
>> GlobalTime, it will cause the symptoms seen.
>>
>> As these things go, as soon as you try to verify this is the problem,
>> the problem stops happening... I hope monday I'll have evidence one
>> way or the other.
>>
>>> When this occurs, are we getting any packets at all? (your strace should
>>> have
>>> packet reading, since we do a select() )
>>
>> It's using the linux PCAP ringbuffer, so there are no kernel calls for
>> fetching the packets, only writes.
>>
>>> My suspicion is that if all is as it should be, but we all of a sudden
>>> get a
>>> leap in our  global time, argus maybe so loaded that it is getting
>>> behind.
>>> But, that is a very preliminary guess.
>>
>> That sounds like the most plausible idea so far. It's possible to work
>> around this is argus, the question is should we. In ArgusWrite* if you
>> change the test with lastwritten to be not-equal (!=) rather than
>> less-than (<) the issue shouldn't arise either.
>>
>> I hope to have more info soon.
>>
>> Have a nice day,
>> --
>> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
>>
>
> Carter Bullard
> CEO/President
> QoSient, LLC
> 150 E 57th Street Suite 12D
> New York, New York  10022
>
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
>
>
>



-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/



More information about the argus mailing list