segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4
Peter Van Epp
vanepp at sfu.ca
Sun May 17 23:28:42 EDT 2009
On Sun, May 17, 2009 at 01:00:18PM +0200, Gunnar Lindberg wrote:
> My business trip left me with a real cold, so I will probably not
> be at the office in a few days (I do this from home - too curious
> to stay away from mail :-).
>
> We'll see what of the sugestions will apply (I've browsed forward
> so at least I'm aware of them). However, before I go back to bed:
>
> calloc() et.al. have their own area for bookkeeping - linked lists
> of what is in use and what is free etc. This is data within our own
> process, so if a pointer goes haywire we can write into it and will
> affect the operation of these "solid routines". I think I mentioned
> such a "ticking bomb" before - the word nightmare comes to mind.
Yep it is, unfortunatly its not easy to fix. Requesting less than the
full packet from pcap decreases the memory bandwidth needed to do the capture.
The trade off is that we may (with long headers) truncate the headers. If the
headers are truncated and argus assumes (which it does) they are complete its
possible to get garbage in (in which case we get garbage out, which is what
we think we are seeing possibly :-)). The only true fix is to capture the
entire packet and check the checksums (assuming of course that header checksums are enabled), but that runs in to performance problems on most everything
except DAG NICs (and at OC192 even there in most cases). You will sometimes
see a 10 gig ethernet capture solution which admits to being able to do 6.4
gigabits per second not the full 10 gigs. This indicates they have non
interleaved DDR2 RAM in their capture path somewhere since that is the
calculated throughput of a DDR2 DIMM. In the argus sensor machine case
there is a NIC buffer to kernel memory copy (usually of the full packet), then
a kernel to user space memory to memory copy in libpcap (except in the
mmaped case which avoids this copy) plus the CPU is executing instructions out
of that same 6.4 gigs of memory so performance can go to hell very quickly at
high line rates and we haven't even considered disk queuing yet :-).
>
> Recalling from the top of my head: I think we capture 12 bytes of
> user data (which we drop almost at once - it's really sensitive to
> grab anything else than headers).
>
> The protocols I'm aware of is IPv4 and IPv6, but since it's Ethernet
> (and we don't have full control over it) other "interesting" things
> may well have shown up.
>
> Finally, the idea of catching the last packet in a file. Isn't that
> last packet saved in a buffer somewhere in 'core.18482'? Maybe some-
> one can instruct me how to find that data (pointer name, or how to
> find it) and how to have gdp print the buffer.
All of the incoming packets from pcap which will include the 12 bytes
of user data will be stored in the file.You will probably want to edit the
pcap file and blank the user data or feed it to tcpdump and only supply the
header output if you can (the sensitivity of the input data makes debugging
exciting :-)). This is the source of the potential performance problems, you
are writing the pcap records to disk as well as the argus data which increases
needed memory bandwidth substantially and will likely cause packet loss as the
NIC buffers don't get serviced fast enough. The only good part is that the
problem packet has to make it in to the pcap buffer to cause the seg fault so
there is a good chance that the offending packet will get caught :-). In times
past with these types of problems argus has managed to process a few more
packets before the damage causes the seg fault, thus it isn't necessarily the
last packet (which is indeed in memory some where) but 2 or 3 before it which
have been flushed from memory already unfortunatly that caused the problem.
Peter Van Epp
More information about the argus
mailing list