Problem with argus under load not reopening output file.
Carter Bullard
carter at qosient.com
Sun Jan 11 11:42:04 EST 2009
Hey Martijn,
If you don't mind, I'd like to keep the thread in the argus mailing
list.
Sounds like an ntp or a rdate() issue, where the machine's time is
getting modified to a bogus value. This is of course an old way
of attacking a machine ;o)
Argus can attempt to catch jumps in time by testing the timestamp
in the packets against the systems global time, or a delta of the
two. But, how to respond to the jump is a problem, because
we don't know which system (packet capture/host os) is correct.
Carter
On Jan 11, 2009, at 8:17 AM, Martijn van Oosterhout wrote:
> On Sat, Jan 10, 2009 at 5:01 PM, Carter Bullard <carter at qosient.com>
> wrote:
>> Hey Martijn,
>> Sorry for the delayed response. I've gone over the code a bit, and
>> I don't
>> see how this jump can occur. But that is the nature of bugs,
>> sometimes.
>>
>> There are a few things that I need to proceed. What platform, OS,
>> 64-bit?
>> What are we connecting to. 1Gbps? 10Gbps? And, what is the
>> processor
>> load for argus? And the interrupt rate, any since as to how many
>> packets
>> per second?
>
> The load is about 30%, IIRC. The line is 200-250Mbs, so high but not
> exceptional. In the meantime there has arisen another theory: namely
> that under certain circumstances the Linux kernel version we're
> running (2.6.20) can for a very short time return a time 1 hour 13
> minutes in the future. If argus happens to pick that one as
> GlobalTime, it will cause the symptoms seen.
>
> As these things go, as soon as you try to verify this is the problem,
> the problem stops happening... I hope monday I'll have evidence one
> way or the other.
>
>> When this occurs, are we getting any packets at all? (your strace
>> should
>> have
>> packet reading, since we do a select() )
>
> It's using the linux PCAP ringbuffer, so there are no kernel calls for
> fetching the packets, only writes.
>
>> My suspicion is that if all is as it should be, but we all of a
>> sudden get a
>> leap in our global time, argus maybe so loaded that it is getting
>> behind.
>> But, that is a very preliminary guess.
>
> That sounds like the most plausible idea so far. It's possible to work
> around this is argus, the question is should we. In ArgusWrite* if you
> change the test with lastwritten to be not-equal (!=) rather than
> less-than (<) the issue shouldn't arise either.
>
> I hope to have more info soon.
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
>
Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax
More information about the argus
mailing list