segfault at 000000000311c000 rip 000000000040fb46 rsp 0000007fbffff830 error 4
Peter Van Epp
vanepp at sfu.ca
Thu Apr 30 22:30:26 EDT 2009
On Thu, Apr 30, 2009 at 11:09:40AM +0200, Gunnar Lindberg wrote:
> I'm new to this list and I hope my question is relevant for it. If
> the subject has been beaten to death please direct me to the archives.
>
> We have 2 Argus collector machines, each with opto splitters and
> 2 Myricom 10GbE cards (using only the Rx side, of course). Then a
> third machine merging all data.
>
> Recently both collectors have started to report as below (and stopped
> collecting data until argus is restarted). It happens roughly once a
> day and we're not happy - a litte worse than just annoying.
>
>
> /var/log/messages (and dmesg)
>
> Apr 28 15:38:07 argc kernel: argus[17240]: segfault at 0000000004a14000 rip 000000000040fb46 rsp 0000007fbffff830 error 4
> Apr 28 15:18:42 argc kernel: argus[16262] general protection rip:3fabc696bd rsp:7fbffff5c0 error:0
>
> Apr 29 15:50:21 argv kernel: argus[2511]: segfault at 000000000311c000 rip 000000000040fb46 rsp 0000007fbffff830 error 4
> Apr 29 16:35:20 argv kernel: argus[2641] general protection rip:40efbf rsp:7fbffff778 error:0
>
>
> Bad memeory?
>
>
> Gunnar Lindberg, IRT,
> Chalmers University of Technology,
> Gothenburg, Sweden
>
<snip>
In addition to Carter's suggestion to switch to the latest development
beta code here are a few more suggestions (although depending on your link
speed some may cause problems :-). If you are familiar with packet capture
on fast links you may well know most of these already.
1) touch .debug and .devel in the argus source directory (this enables
debugging via gdb and debug logging but at a cost in performance).
Enable core dumps if they aren't already (or are set to 0 length) so the
seg fault will core. That will tell us where the seg fault occurred.
2) This one will certainly hit preformance: disable Interrupt Coalescing
on the NICs. Multiple packets appearing with the same timestamp tend to
confuse argus (as well as providing stastistical inaccurracy). The correct
solution here is Endace DAG cards but they are very pricey.
3) Boost your kernel tcp buffers way way up, the defaults aren't any good at
gig speeds let alone 10 gig. If you are HPC folks you probably already know
all this :-).
4) Use Phil Wood's mmapped libpcap mod (the kernel mods are already present in
most 2.6 kernels) or use pf-ring from ntop.org (although it is a lot harder
to get in last I tried it). Give it as much memory as you can spare (I used
to give it 2 gigs on a 4 gig machine).
5) Consider using PowerPC machines. Having network byte order in the machine
saves a good chunk of memory speed and CPU time not having to byte swap.
That said processing a tcpdump file through argus on a Sun 4200 Opteron
machine (with DDR3 ram) appeared to be faster than my IBM P510 Power5 PPC
machine using DDR2 memory so faster memory may be the best answer. Memory
is certainly going to be a bottleneck at very high line rates.
6) While from your decription I think you are already doing this, don't store
to disk on the sensor machine. Let argus write the data to a socket and
archive the data stream via ra (or radium which I think is the new answer
:-)) on a third machine. Writing to disk on the sensor machine will cause
packet loss at around the 50 to 70 megabit per second traffic level (and
of course worse at higher speeds).
Peter Van Epp
More information about the argus
mailing list