ArgusBug: argus and high speed networks
JOSE.JEREZ.EXT
jose.jerez.ext at juntadeandalucia.es
Mon Nov 24 04:22:04 EST 2003
Description:
I'm having a problem that is driving me crazy, I have installed argus to
monitor the traffic going through our router, the WAN link is an ATM
155Mbps, tough according to mrtg the bandwith utilization is under 2% or
400Kbytes/s, so the network is not that "high-speed".
The problem is that when I use ra or ragator to read the data generated by
argus they seg. fault and die. They don't die straight away but when
reaching certain points in the data:
#ra -n -r argus.log
.....
20 Nov 03 13:58:16 arp 10.229.128.112 who-has 10.229.130.12 1 0 60 0 INT
20 Nov 03 13:58:17 arp 10.229.129.72 who-has 10.229.128.2 1 0 60 0 INT
20 Nov 03 13:58:16 arp 10.229.128.67 who-has 10.229.130.12 1 0 60 0 INT
20 Nov 03 13:58:15 llc 0:2:b3:f:a:63 nul -> 3:0:0:0:0:1 nul 1 0 180 0 INT
Segmentation fault (core dumped)
Different argus data files cause the fault in different points, so my guess
here is that the problem is not in the clients but in the argus daemon
itself although it doesn't die or send any messages to syslog.
If I skip the faulty point the data are showed to the next faulty point
if there's any:
#ra -n -r argus.log -t 14:02-16:00
......
20 Nov 03 14:19:02 arp 10.229.128.2 who-has 10.229.128.11 2 0 120 0 INT
20 Nov 03 14:19:02 arp 10.229.181.16 who-has 10.229.181.68 1 0 60 0 INT
20 Nov 03 14:19:05 llc 0:40:96:55:ea:8 nul -> 1:0:c:cc:cc:cc nul 1 0 164 0 INT
Segmentation fault (core dumped)
Some files have only one faulty point, others have several, usually less
than four in a 24h data file. I can't find a pattern to understand or
forsee when the faulty points happend, they don't necessarily happend at
the moments of higher traffic or cpu load, although mostly during working
hours (8:00-15:00) when the average load is higher.
I've got two other argus daemons collecting data in two different networks
and they work flawlessly; the difference is that these links are frame
relays 256Kbps, with much less traffic than the troubling one of course.
I have used different versions of argus server and clients in two different
computers (Pentium IV,2GHz,512Mb RAM, Intel ether express 10/100) with the
same results:
-argus-server 2.0.5.beta.5-3 and argus-clients-2.0.6.beta.38 in debian
woody with a kernel 2.4.22 and 2.2.20
-argus server 2.0.6.beta.14 and argus clients 2.0.6.beta.47 in a Mandrake 9
with kernel 2.4.19
I also reniced the argus daemon process to have a higher priority but it
doesn't usually go above 3% of the cpu usage.
>How-To-Repeat:
To reproduce the problem I manage to collect an argus data file of a
reasonable size (630k gzip compressed) you can get it here:
http://www.TrustClip.com/C5Q1AVTT6L14Ae67OBF1A5Q3EEK
>Fix:
No fix so far.
>Submitter-Id: <submitter ID>
>Originator: Jose Jerez
>Organization:
Ministry in the regional government of Andalusia (Spain) (Consejeria de
Obras Publicas y Transportes)
>Argus support: email support
>Release: argus-2.0
>Product: argus, ra, ragator
>Synopsis: argus fails in high-speed networks
>Class: sw-bug
>Severity: critical
>Priority: high
System: Linux monitor 2.4.22 #4 Mon Nov 10 10:38:38 CET 2003 i686 unknown
Arch: i686
Paths: /usr/local/bin/ra /usr/bin/make /usr/bin/gcc /usr/bin/cc
RA: Ra Version 2.0.6.beta.38
GCC: Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)
LIBC:
lrwxrwxrwx 1 root root 13 Nov 6 09:03 /lib/libc.so.6 -> libc-2.2.5.so
-rwxr-xr-x 1 root root 1153784 Apr 8 2003 /lib/libc-2.2.5.so
-rw-r--r-- 1 root root 2391002 Apr 8 2003 /usr/lib/libc.a
-rw-r--r-- 1 root root 178 Apr 8 2003 /usr/lib/libc.so
More information about the argus
mailing list