segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Peter Van Epp vanepp at sfu.ca
Sun May 17 23:28:42 EDT 2009


On Sun, May 17, 2009 at 01:00:18PM +0200, Gunnar Lindberg wrote:
> My business trip left me with a real cold, so I will probably not
> be at the office in a few days (I do this from home - too curious
> to stay away from mail :-).
> 
> We'll see what of the sugestions will apply (I've browsed forward
> so at least I'm aware of them). However, before I go back to bed:
> 
> calloc() et.al. have their own area for bookkeeping - linked lists
> of what is in use and what is free etc. This is data within our own
> process, so if a pointer goes haywire we can write into it and will
> affect the operation of these "solid routines". I think I mentioned
> such a "ticking bomb" before - the word nightmare comes to mind.

	Yep it is, unfortunatly its not easy to fix. Requesting less than the
full packet from pcap decreases the memory bandwidth needed to do the capture.
The trade off is that we may (with long headers) truncate the headers. If the
headers are truncated and argus assumes (which it does) they are complete its
possible to get garbage in (in which case we get garbage out, which is what
we think we are seeing possibly :-)). The only true fix is to capture the 
entire packet and check the checksums (assuming of course that header checksums are enabled), but that runs in to performance problems on most everything 
except DAG NICs (and at OC192 even there in most cases). You will sometimes 
see a 10 gig ethernet capture solution which admits to being able to do 6.4 
gigabits per second not the full 10 gigs. This indicates they have non
interleaved DDR2 RAM in their capture path somewhere since that is the 
calculated throughput of a DDR2 DIMM. In the argus sensor machine case
there is a NIC buffer to kernel memory copy (usually of the full packet), then
a kernel to user space memory to memory copy in libpcap (except in the
mmaped case which avoids this copy) plus the CPU is executing instructions out 
of that same 6.4 gigs of memory so performance can go to hell very quickly at 
high line rates and we haven't even considered disk queuing yet :-). 

> 
> Recalling from the top of my head: I think we capture 12 bytes of
> user data (which we drop almost at once - it's really sensitive to
> grab anything else than headers).
> 
> The protocols I'm aware of is IPv4 and IPv6, but since it's Ethernet
> (and we don't have full control over it) other "interesting" things
> may well have shown up.
> 
> Finally, the idea of catching the last packet in a file. Isn't that
> last packet saved in a buffer somewhere in 'core.18482'? Maybe some-
> one can instruct me how to find that data (pointer name, or how to
> find it) and how to have gdp print the buffer.

	All of the incoming packets from pcap which will include the 12 bytes 
of user data will be stored in the file.You will probably want to edit the 
pcap file and blank the user data or feed it to tcpdump and only supply the
header output if you can (the sensitivity of the input data makes debugging 
exciting :-)). This is the source of the potential performance problems, you 
are writing the pcap records to disk as well as the argus data which increases 
needed memory bandwidth substantially and will likely cause packet loss as the
NIC buffers don't get serviced fast enough. The only good part is that the 
problem packet has to make it in to the pcap buffer to cause the seg fault so
there is a good chance that the offending packet will get caught :-). In times 
past with these types of problems argus has managed to process a few more 
packets before the damage causes the seg fault, thus it isn't necessarily the 
last packet (which is indeed in memory some where) but 2 or 3 before it which 
have been flushed from memory already unfortunatly that caused the problem.

Peter Van Epp



More information about the argus mailing list