Stability problems.

Carter Bullard carter at qosient.com
Thu Jun 7 11:28:29 EDT 2001


Hey Chris,
   Sounds like your truncating records on the argus end.
One possibility.  You are probably forcibly deleting
records at the argus end to control the queue sizes, and 
you maybe running into a bug with that.  A record is partially
written, but because of queue load, we elect to delete it.
This would not be good, as only a partial record is written.
The receiving ra can detect this and recover, but not for a
period of time.

   Look in your /var/log/messages for argus messages,
especially "Queue Exceeded Max" messages.  This would indicate
that you are throwing records away.

   Changing the value of ArgusMaxListLength should help.
Use this patch:

Index: ArgusUtil.c
===================================================================
RCS file: /usr/local/cvsroot/argus/server/ArgusUtil.c,v
retrieving revision 1.77.2.4
diff -r1.77.2.4 ArgusUtil.c
800c800
< int ArgusMaxListLength = 16384;
---
> int ArgusMaxListLength = 262144;


The value doesn't have to be a binary number, I just happen
to like them. I'll take a look at the delete logic.

Carter

Carter Bullard
QoSient, LLC
300 E. 56th Street, Suite 18K
New York, New York  10022

carter at qosient.com
Phone +1 212 588-9133
Fax   +1 212 588-9134
http://qosient.com


-----Original Message-----
From: Chris Newton [mailto:newton at unb.ca] 
Sent: Thursday, June 07, 2001 8:57 AM
To: Carter Bullard
Subject: Stability problems.


Hi Carter.

  Since I moved into client/server mode, I have had a few bumps of 
instability.

  I'm running the most current code.

  The sensor is a linux 2.4.x redhat 7.1 box, 512 MB ram, 600 MB swap,
dual 
800 Mhz cpus

  The recieving end has dual 1 Ghz CPUs and 1.2 GB ram.  Ra is running
on 
this, dumping to local files.

  We are monitoring a link that has a possible traffic rate of a full
duplex, 
100 MBit connection.

  Sometimes we are receiving DoS attacks that cause the server to grow
and 
grow and grow... it doesn't appear to dump it's records to the attached
client 
at a fast enough rate to make sure the box doesnt run out of memory.

  When the server gets in this state, it starts sending invalid records
to the 
client.  Some of these records have incredible duration times.. (1 had a
135 
year duration).

  Today, I'm not sure what occured, but:

01-06-07 08:43:34 0.000000 Fs 131 1.4.0.104  <-> 0.144.8.0  991914214
180000 
180000 3459164706 CON


  0 duration.  The other IP involved was 0.144.8.0 (not possible).  The
991M 
src packets, is in 30 seconds..., but only 180K.

  For a number of minutes after an event like this, the ra client has
trouble 
getting anything meaningful out of the server... often only outputting
flow 
record files (for 30 seconds) with very few flows in it, the next file
with 
lots of flows... so on.


attached is the flow record file.

Let me know how I can help you track down this problem.

Thanks Carter

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

Chris Newton, Systems Analyst
Computing Services, University of New Brunswick
newton at unb.ca 506-447-3212(voice) 506-453-3590(fax)

"The best way to have a good idea is to have a lot of ideas." Linus
Pauling (1901 - 1994) US chemist




More information about the argus mailing list