segfault at 000000000311c000 rip 000000000040fb46 rsp 0000007fbffff830 error 4

Peter Van Epp vanepp at sfu.ca
Thu Apr 30 22:30:26 EDT 2009


On Thu, Apr 30, 2009 at 11:09:40AM +0200, Gunnar Lindberg wrote:
> I'm new to this list and I hope my question is relevant for it. If
> the subject has been beaten to death please direct me to the archives.
> 
> We have 2 Argus collector machines, each with opto splitters and
> 2 Myricom 10GbE cards (using only the Rx side, of course). Then a
> third machine merging all data.
> 
> Recently both collectors have started to report as below (and stopped
> collecting data until argus is restarted). It happens roughly once a
> day and we're not happy - a litte worse than just annoying.
> 
> 
> /var/log/messages (and dmesg)
> 
> Apr 28 15:38:07 argc kernel: argus[17240]: segfault at 0000000004a14000 rip 000000000040fb46 rsp 0000007fbffff830 error 4
> Apr 28 15:18:42 argc kernel: argus[16262] general protection rip:3fabc696bd rsp:7fbffff5c0 error:0
> 
> Apr 29 15:50:21 argv kernel: argus[2511]: segfault at 000000000311c000 rip 000000000040fb46 rsp 0000007fbffff830 error 4
> Apr 29 16:35:20 argv kernel: argus[2641] general protection rip:40efbf rsp:7fbffff778 error:0
> 
> 
> Bad memeory?
> 
> 
> 	Gunnar Lindberg, IRT,
> 	Chalmers University of Technology,
> 	Gothenburg, Sweden
> 
<snip>

	In addition to Carter's suggestion to switch to the latest development
beta code here are a few more suggestions (although depending on your link
speed some may cause problems :-). If you are familiar with packet capture 
on fast links you may well know most of these already.

1) touch .debug and .devel in the argus source directory (this enables 
   debugging via gdb and debug logging but at a cost in performance).
   Enable core dumps if they aren't already (or are set to 0 length) so the
   seg fault will core. That will tell us where the seg fault occurred. 

2) This one will certainly hit preformance: disable Interrupt Coalescing 
   on the NICs. Multiple packets appearing with the same timestamp tend to
   confuse argus (as well as providing stastistical inaccurracy). The correct
   solution here is Endace DAG cards but they are very pricey.

3) Boost your kernel tcp buffers way way up, the defaults aren't any good at
   gig speeds let alone 10 gig. If you are HPC folks you probably already know
   all this :-). 

4) Use Phil Wood's mmapped libpcap mod (the kernel mods are already present in
   most 2.6 kernels) or use pf-ring from ntop.org (although it is a lot harder
   to get in last I tried it). Give it as much memory as you can spare (I used
   to give it 2 gigs on a 4 gig machine). 
   
5) Consider using PowerPC machines. Having network byte order in the machine
   saves a good chunk of memory speed and CPU time not having to byte swap.
   That said processing a tcpdump file through argus on a Sun 4200 Opteron 
   machine (with DDR3 ram) appeared to be faster than my IBM P510 Power5 PPC 
   machine using DDR2 memory so faster memory may be the best answer. Memory
   is certainly going to be a bottleneck at very high line rates.

6) While from your decription I think you are already doing this, don't store
   to disk on the sensor machine. Let argus write the data to a socket and
   archive the data stream via ra (or radium which I think is the new answer 
   :-)) on a third machine. Writing to disk on the sensor machine will cause
   packet loss at around the 50 to 70 megabit per second traffic level (and
   of course worse at higher speeds). 

Peter Van Epp 



More information about the argus mailing list