segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Gunnar Lindberg Gunnar.Lindberg at chalmers.se
Tue May 19 06:17:37 EDT 2009


I'll try the 128-trick and see what it gives. Now I haven't read
all the code yet, so maybe it does what I'm about to say:

    Regardless of what capture size we use we must handle that
    headers may be incomplete, if for nothing else the IPv6
    Next_Header chain (RFC2460) may take us to that point.

The other thing I plan to do is to inspect the last packet to see
if that happens to be always the same (or similar); if it's some
uncommon type maybe the parsing code has been less used/tested.
A quick look on the latest packet gave ip_proto==47(IPPROTO_GRE);
whether or not that is significant remains to be seen.

All of this next week, sorry.

	Gunnar

>From argus-info-bounces+gunnar.lindberg=chalmers.se at lists.andrew.cmu.edu  Mon May 18 15:31:42 2009
>Message-Id: <C46F56C8-4BD3-4B1E-BAC7-A64E4F0B4145 at qosient.com>
>From: Carter Bullard <carter at qosient.com>
>To: Peter Van Epp <vanepp at sfu.ca>
>In-Reply-To: <20090518032842.GA21264 at sfu.ca>
>Date: Mon, 18 May 2009 09:30:54 -0400
>References: <098FF7AB-8F76-4A16-B816-D409E840B884 at qosient.com>
>	<200905171100.n4HB0IBL010755 at grunert.cdg.chalmers.se>
>	<20090518032842.GA21264 at sfu.ca>
>Cc: argus-info at lists.andrew.cmu.edu
>Subject: Re: [ARGUS] segfault at 000000000311c000 rip
>	000000000040fb46rsp	0000007fbffff830 error 4
>List-Id: "Open list for users of Argus audit SW -
>	http://www.qosient.com/argus" <argus-info.lists.andrew.cmu.edu>

>Hey Gunnar,
>Peter's suggestion is to capture all the packets that argus see's up to
>the fault.  Argus can write every packet it see's to a libpcap based
>packet capture file.  The file can be rotated/deleted anytime, and
>argus will recreate the file, in case it gets too big.

>Yes, I recommend testing the snaplen first (argus -s 128) leaving
>all other options the same, and if that doesn't change things, then
>putting an "- ip" filter would be my next step.

>Thanks for all the help!!!!!!

>Carter

>On May 17, 2009, at 11:28 PM, Peter Van Epp wrote:

>> On Sun, May 17, 2009 at 01:00:18PM +0200, Gunnar Lindberg wrote:
>>> My business trip left me with a real cold, so I will probably not
>>> be at the office in a few days (I do this from home - too curious
>>> to stay away from mail :-).
>>>
>>> We'll see what of the sugestions will apply (I've browsed forward
>>> so at least I'm aware of them). However, before I go back to bed:
>>>
>>> calloc() et.al. have their own area for bookkeeping - linked lists
>>> of what is in use and what is free etc. This is data within our own
>>> process, so if a pointer goes haywire we can write into it and will
>>> affect the operation of these "solid routines". I think I mentioned
>>> such a "ticking bomb" before - the word nightmare comes to mind.
>>
>> 	Yep it is, unfortunatly its not easy to fix. Requesting less than the
>> full packet from pcap decreases the memory bandwidth needed to do  
>> the capture.
>> The trade off is that we may (with long headers) truncate the  
>> headers. If the
>> headers are truncated and argus assumes (which it does) they are  
>> complete its
>> possible to get garbage in (in which case we get garbage out, which  
>> is what
>> we think we are seeing possibly :-)). The only true fix is to  
>> capture the
>> entire packet and check the checksums (assuming of course that  
>> header checksums are enabled), but that runs in to performance  
>> problems on most everything
>> except DAG NICs (and at OC192 even there in most cases). You will  
>> sometimes
>> see a 10 gig ethernet capture solution which admits to being able to  
>> do 6.4
>> gigabits per second not the full 10 gigs. This indicates they have non
>> interleaved DDR2 RAM in their capture path somewhere since that is the
>> calculated throughput of a DDR2 DIMM. In the argus sensor machine case
>> there is a NIC buffer to kernel memory copy (usually of the full  
>> packet), then
>> a kernel to user space memory to memory copy in libpcap (except in the
>> mmaped case which avoids this copy) plus the CPU is executing  
>> instructions out
>> of that same 6.4 gigs of memory so performance can go to hell very  
>> quickly at
>> high line rates and we haven't even considered disk queuing yet :-).
>>
>>>
>>> Recalling from the top of my head: I think we capture 12 bytes of
>>> user data (which we drop almost at once - it's really sensitive to
>>> grab anything else than headers).
>>>
>>> The protocols I'm aware of is IPv4 and IPv6, but since it's Ethernet
>>> (and we don't have full control over it) other "interesting" things
>>> may well have shown up.
>>>
>>> Finally, the idea of catching the last packet in a file. Isn't that
>>> last packet saved in a buffer somewhere in 'core.18482'? Maybe some-
>>> one can instruct me how to find that data (pointer name, or how to
>>> find it) and how to have gdp print the buffer.
>>
>> 	All of the incoming packets from pcap which will include the 12 bytes
>> of user data will be stored in the file.You will probably want to  
>> edit the
>> pcap file and blank the user data or feed it to tcpdump and only  
>> supply the
>> header output if you can (the sensitivity of the input data makes  
>> debugging
>> exciting :-)). This is the source of the potential performance  
>> problems, you
>> are writing the pcap records to disk as well as the argus data which  
>> increases
>> needed memory bandwidth substantially and will likely cause packet  
>> loss as the
>> NIC buffers don't get serviced fast enough. The only good part is  
>> that the
>> problem packet has to make it in to the pcap buffer to cause the seg  
>> fault so
>> there is a good chance that the offending packet will get  
>> caught :-). In times
>> past with these types of problems argus has managed to process a few  
>> more
>> packets before the damage causes the seg fault, thus it isn't  
>> necessarily the
>> last packet (which is indeed in memory some where) but 2 or 3 before  
>> it which
>> have been flushed from memory already unfortunatly that caused the  
>> problem.
>>
>> Peter Van Epp
>>



More information about the argus mailing list