[ARGUS] Segmentation Fault with 2.0.6rc2 on FreeBSD 4.9-RELEASE

Peter Van Epp vanepp at sfu.ca
Fri Apr 9 20:35:35 EDT 2004


On Fri, Apr 09, 2004 at 07:00:36PM -0500, eric wrote:
> On Fri, 2004-04-09 at 16:10:16 -0700, Peter Van Epp proclaimed...
> 
> > 	Is it possible to get a tcpdump of the input during one of these 
> > crashes? With that and tcpreplay on a test machine (a big test machine :-))
> > it may be possible to receate the crash. Touching .devel and .debug in the
> > argus source directory and recompiling with symbols might help some as well
> > (it also may slow you down enough to work even worse though).
> 
> If I can find the exact timeframe, then yes, I could sniff for a few
> minutes. But anything more and we run out of disk space.

	Yes I can see that would be a problem OK :-) Is there anything about
the traffic that is common that you can see? Is some particular type of traffic
doing something ugly?

> 
> > 	It sounds like you are already writing via a socket from the sensor
> > box to ra on another box (if not this is worth doing because the disk I/O on
> > a single box is known to cause at least packet loss).
> 
> Nope, we're writing right to a file; using the following parameter.
> 
> ARGUS_OUTPUT_FILE=/path/to/output/file

	The CMU folks (the highest data rate argus site I know of) told me they
found packet loss when trying to write to disk on the sensor machine. They have
one machine as a sensor that runs argus_bpf and writes the data stream to a 
socket and another machine that runs ra listening to that socket and archiving 
the data to disk on the second machine. 


> 
> > netstat -i
> > netstat -m
> > 
> > after a crash would be good bets to see if the kernel is running out of mbufs.
> 
> $ netstat -i
> Name  Mtu   Network    Address            Ipkts Ierrs    Opkts Oerrs  Coll
> em0   1500  <Link#1>  00:00:00:xx:xx:xx 1645131079 20564  0     0      0
> em1   1500  <Link#2>  00:00:00:yy:yy:yy 1586377066 19322  0     0      0
> 
> $ netstat -m
> 770/1120/6144 mbufs in use (current/peak/max):
>         770 mbufs allocated to data
> 768/840/1536 mbuf clusters in use (current/peak/max)
> 1960 Kbytes allocated to network (42% of mb_map in use)
> 
> > I saw a kernel tuning page on the tcpreplay web page at sourceforge but a 
> > quick look at the FAQ only turned up "experiment with NMBCLUSTERS in the kernel
> > config file". I think there is another comment on boosting kernel buffer sizes
> > in general on the BSDs that may be worth looking at somewhere there.
> > 	I assume you have an ioctl such as 
> > 
> > /sbin/sysctl debug.bpf_bufsize=524288
> > 
> 
> Ok, I have debug.bpf_bufsize: 16384  -- should I increase this?
> 

	If you aren't seeing packet loss in the man stats then you shouldn't 
need to boost this. This buffer overflowing is where the bpf packet loss error
stat comes from (the kernel can also silently lose packets before that if the 
kernel buffers aren't big enough though although that should show up in the
netstats and they look fine). I boost mine to the max so it has headroom if
something delays argus from reading the bpf buffer.


> Here's the other performance tunings I've made...
> 
> kern.maxproc=10240
> kern.maxprocperuid=7680
> kern.maxusers=128
> kern.ipc.somaxconn=1024
> kern.ipc.nmbclusters=32768
> 
> > to boost the libpcap buffer to max size? I don't think any of these are likely
> > the base problem, but one or more might help if something ugly is happening 
> > before the traffic gets to argus.
> > 	Do you see any messages in syslog about
> > 
> > ArgusWriteOutSocket(0x%x) Queue Count %d
> > ArgusWriteOutSocket(0x%x) failed to create file %s
> > ArgusWriteOutSocket(0x%x) Exceeded Maximum Errors
> > ArgusWriteOutSocket(0x%x) Queue Exceeded Maximum Limit
> 
> Nope. Also, I increased the argus buffer size by using this patch in
> src/server
> 
> --- server/ArgusUtil.c.orig    Mon Apr  5 01:42:44 2004
> +++ server/ArgusUtil.c Mon Apr  5 01:42:50 2004
> @@ -815,9 +815,9 @@ ArgusDeleteSocket (struct ArgusSocketStr
>  #include <fcntl.h>
> 
>  #define ARGUS_MAXERROR         20000
> -#define ARGUS_MAXWRITENUM      2048
> +#define ARGUS_MAXWRITENUM      32768
> 
> -int ArgusMaxListLength = 262144;
> +int ArgusMaxListLength = 1028576;
> 
>  int ArgusReadSocket (struct ArgusSocketStruct *asock, ArgusHandler ArgusThisHandler, void *data)
> 
> > These are all in the area of code that should be the problem.
> 
> Thanks for the help.
> 

	So far I'm not sure I've been much help :-). Compiling argus with debug
(although I'd guess debug output would kill you) and symbols might help some
with being able to poke around where and doing what it dies. Carter may be able
to suggest some additional syslog messages around the area where the failure
seems to happen to get more info on whats unhappy.


Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada



More information about the argus mailing list