segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Carter Bullard carter at qosient.com
Mon Jul 6 10:12:37 EDT 2009


Hey Gunnar,
Another strategy to try when you return, if at all possible, is to  
find another
platform, or a 32-bit machine, and look to see if it dies at the same  
time,
or in the same manner.  At least that would help to eliminate the
"it's the 64-bit" thats the problem.

I am going to go through the code this week to see if I can find  
anything
related to poor memory management.  I use valgrind() this way:

    valgrind /usr/local/sbin/argus -d

(my argi run as daemons, and using the "-d" toggles that setting to  
off for me).
Any memory complaints will come out pretty quickly.  If we are doing  
something
wrong with memory, we'll do it many many times before it tickles a  
part of memory
that kills argus.

The .devel tag in the root directory is important, as valgrind will  
tell us what
line numbers are involved if compiled with the options the ".devel" tag
generates.

Carter

On Jul 1, 2009, at 9:59 AM, Gunnar Lindberg wrote:

> The idea that this may be strings is interesting, so just like
> Carter I took my old ASCII chart - but it didn't say me much.
> And, next crash that occured yesterday afternoon made me go for
> the 8-bit 8859 chart - i.e. I think its more random data.
>
> {start = 0x2029706f742e656c, end = 0x702e73696874202b,
> {start = 0xb9d2bcfac6fa3f4c, end = 0x3d078cfe490497a0,
>
>
>> Hey Gunnar, any chance you can use valgrind() to see if we're doing
>> something wrong with memory?
>
> # ll -o .devel .threads
> /bin/ls: .threads: No such file or directory
> -rw-rw-r--  1 root 0 May 15 07:13 .devel
>
> I assume you mean something like
>  /usr/bin/valgrind [-???] /usr/local/sbin/argus >& /var/log/xxx.log &
> and than we see what's in xxx.log after a crash. Right? We have some
> kind of cron based watchdog and I guess we leave that as is, so that
> we get xxx.log once and are back in ordinary business after.
>
> Mixed news, good and bad... :-).
>
> 1) Sweden has longer holidays than the US and I'm looking forward
>   to my 5 weeks, starting on Mon Jun 6; back Mon Aug 10. And, for
>   once I'm going to stay away completely, not even email :-), so
>   I'll resist the tempting "let's just set it up, only once...".
>
> 2) Since I'm not at all familiar with valgrind I would appreciate
>   some advice on "-???".
>
> So, at middle of Aug I guess we can do a "valgrind thing".
>
> Possibly that could be combined with an idea we have to capture
> that last batch of data - I'm reluctant to writing raw capture
> data to a file, but I think we can save what was actually written
> up to just before the crash; just needs some watchdog adjustment.
>
> And, of course we must stay open for the possibility that what we
> have is just a chunk of bad memory. Since both machines' argus
> crash I consider memory fault unlikely, but both are the same age,
> so it's not entirely impossible. What would be the best mem test?
>
> 	Gunnar Lindberg
>
> Latest
> -rwxrwxr-x  1 root 829739 Jun  1 07:47 argus
> -rw-r--r--  1 root 70807552 Jun 30 14:51 core.1584
> argc# gdb argus.1584 core.1584
> #0  0x0000003fabc705f2 in strcmp () from /lib64/tls/libc.so.6
> #1  0x0000003fabc81d50 in __tzstring () from /lib64/tls/libc.so.6
> #2  0x0000003fabc83b43 in __tzfile_compute () from /lib64/tls/ 
> libc.so.6
> #3  0x0000003fabc82c8b in __tz_convert () from /lib64/tls/libc.so.6
> #4  0x0000003fabcc5abe in vsyslog () from /lib64/tls/libc.so.6
> #5  0x0000003fabcc6066 in syslog () from /lib64/tls/libc.so.6
> #6  0x000000000041569c in ArgusLoadList (l1=0x659460, l2=0x65c0a0)
>    at ArgusUtil.c:273
> #7  0x000000000041a439 in ArgusOutputProcess (arg=0x6596c0)
>    at ArgusOutput.c:477
> #8  0x0000000000408339 in ArgusProcessPacket (src=0x2a95786010,  
> p=0x65bb12 "",
>    length=105, tvp=0x7fbffff4b0, type=0) at ArgusModeler.c:1324
> #9  0x00000000004107db in ArgusEtherPacket (user=0x2a95786010 "",
>    h=0x7fbffff530, p=0x65bb12 "") at ArgusSource.c:716
> #10 0x0000003fac904bff in ?? () from /usr/lib64/libpcap.so.0.8.3
> #11 0x0000000000413cd2 in ArgusGetPackets (src=0x2a95786010)
>    at ArgusSource.c:2099
> #12 0x0000000000404c77 in main (argc=1, argv=0x7fbffffe08) at  
> argus.c:535
>
> #6  0x000000000041569c in ArgusLoadList (l1=0x659460, l2=0x65c0a0)
>    at ArgusUtil.c:273
> 273     ArgusUtil.c: No such file or directory.
>        in ArgusUtil.c
> (gdb) print *l1
> $1 = {start = 0x1127f60, end = 0x112f9e0, count = 270, pushed =  
> 348323494,
>  popped = 0, loaded = 348323224, outputTime = {tv_sec = 0, tv_usec =  
> 0},
>  reportTime = {tv_sec = 0, tv_usec = 0}}
> (gdb) print *l2
> $2 = {start = 0xb9d2bcfac6fa3f4c, end = 0x3d078cfe490497a0,
>  count = 1664177081, pushed = 1214938457, popped = 734219693,
>  loaded = 323085166, outputTime = {tv_sec = -3436253370747411246,
>    tv_usec = 7772834831553979568}, reportTime = {
>    tv_sec = -3784555927640503799, tv_usec = -3225818299882675799}}
>
>
>> From carter at qosient.com Wed Jul  1 01:50:19 2009
>> From: Carter Bullard <carter at qosient.com>
>> To: Peter Van Epp <vanepp at sfu.ca>
>> CC: "argus-info at lists.andrew.cmu.edu" <argus-info at lists.andrew.cmu.edu 
>> >
>> Sender: "argus-info-bounces+gunnar.lindberg=chalmers.se at lists.andrew.cmu.edu 
>> "
>> 	<argus-info-bounces 
>> +gunnar.lindberg=chalmers.se at lists.andrew.cmu.edu>
>> Date: Wed, 1 Jul 2009 01:49:27 +0200
>> Subject: Re: [ARGUS] segfault at 000000000311c000 rip	 
>> 000000000040fb46rsp
>> 	0000007fbffff830 error 4
>> Message-ID: <F0888929-AF98-42D7-85EB-9FAF15AB082E at qosient.com>
>> References: <78C956B9-F7C0-4E75-A37B-843A293386FF at qosient.com>
>> 	<200906290719.n5T7JvTF026686 at grunert.cdg.chalmers.se>
>> 	<20090630223342.GA27655 at sfu.ca>
>> In-Reply-To: <20090630223342.GA27655 at sfu.ca>
>
>> Hey Peter,
>> Something is writing over something, just can't seem to find a  
>> handle.
>> The ArgusLoadList() is passing ArgusListRecords from the Modeler to
>> the Output processor, and it just takes the two link lists and  
>> combines
>> them.  If there is nothing in the receive list, its just a "move the
>> pointers"
>> and there you go.   The receive list should be empty if the output
>> processor is keeping ahead of the load.
>
>> My guess is that we're getting the length of an output record wrong,
>> which can happen if you're sloppy forming a DSR that you rarely use,
>> so it could be a packet specific bug still, or we are using a buffer
>> that
>> has been deallocated/reallocated and  we're stomping on the new
>> users buffer.
>
>> This can happen in threaded applications, so turning off the .threads
>> tag may be a good test.
>
>> Hey Gunnar, any chance you can use valgrind() to see if we're doing
>> something wrong with memory?
>
>> Carter
>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090706/d35ce0cd/attachment.bin>


More information about the argus mailing list