segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4
Gunnar Lindberg
Gunnar.Lindberg at chalmers.se
Tue May 12 01:09:40 EDT 2009
Usually it runs for a very long time (like days) but it has crashed
a few times within, say 30 min.
Now, if we just accept gdb's idea of where it crashed, we find:
257 if (l2->start == NULL)
258 l2->start = l1->start;
259 else
-> 260 l2->end->nxt = l1->start;
It seems like what happens in ArgusLoadList() is that we're moving
data from one list to another and maybe there is a condition some-
where else in the code where l2->end is not filled in correctly -
maybe at high load? The "ticking bomb" that is so hard to find.
Then I'm surprised thas gcc doesn't barf on "l2->end->nxt" since
I have'nt seen a "nxt" inside a "struct ArgusListStruct", which I
think is what "l2->end" is supposed to point at.
I'm out of the office for the rest of the week, so I'm not going
to replace the current argus with the -g version right away.
Hardware is Dell 2950.
OS is RHEL Linux
/etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
# /bin/uname -a
Linux argc.irt.chalmers.se 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 7 17:45:52 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
# /bin/arch
x86_64
gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10)
Gunnar Lindberg
>From carter at qosient.com Tue May 12 01:25:34 2009
>Cc: argus-info at lists.andrew.cmu.edu
>Message-Id: <5D9EB1B9-7EA5-4631-9B58-FF82865ADE70 at qosient.com>
>From: Carter Bullard <carter at qosient.com>
>To: Gunnar Lindberg <Gunnar.Lindberg at chalmers.se>
>In-Reply-To: <200905112110.n4BLAHXV017643 at grunert.cdg.chalmers.se>
>Subject: Re: [ARGUS] segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4
>Date: Mon, 11 May 2009 19:24:46 -0400
>References: <200905112110.n4BLAHXV017643 at grunert.cdg.chalmers.se>
>Hey Gunnar,
>The way you turn on the "-g" option is to do this in the main
>distribution directory:
> % touch .devel
> % ./configure;make clean;make
>That will compile everything with the appropriate flags.
>Well, l2 really looks screwed up. What kind of machine is this 64-bit
>thing?
>I think we're having alignment problems, possibly. Does it run for
>any amount
>of time before it blows up?
>Carter
>On May 11, 2009, at 5:10 PM, Gunnar Lindberg wrote:
>> First of all, I re-run make with "-g" and used that with the existing
>> core file; I can't tell whether that should be OK but as far as I can
>> see it still makes some kind of sense.
>>
>>
>> [lindberg at argv ~]$ gdb argus-g core.14369
>> (gdb) where
>> #0 0x0000000000410bc2 in ArgusLoadList (l1=0x651460, l2=0x6540a0)
>> at ArgusUtil.c:260
>> #1 0x000000000041557b in ArgusOutputProcess (arg=Variable "arg" is
>> not available.
>> ) at ArgusOutput.c:477
>> #2 0x000000000040bb6c in ArgusProcessPacket (src=Variable "src" is
>> not available.
>> ) at ArgusModeler.c:1324
>> #3 0x000000000040d006 in ArgusEtherPacket (user=0x2a95786010 "",
>> h=Variable "h" is not available.
>> )
>> at ArgusSource.c:716
>> #4 0x00000034e2f04bff in ?? () from /usr/lib64/libpcap.so.0.8.3
>> #5 0x0000000000410759 in ArgusGetPackets (src=0x2a95786010)
>> at ArgusSource.c:2093
>> #6 0x0000000000404f83 in main (argc=1, argv=0x7fbffffe08) at
>> argus.c:535
>>
>>
>> (gdb) print *l1
>> $3 = {start = 0x18c21f0, end = 0x17e98b0, count = 589, pushed =
>> 3044164,
>> popped = 0, loaded = 3043575, outputTime = {tv_sec = 0, tv_usec = 0},
>> reportTime = {tv_sec = 0, tv_usec = 0}}
>>
>> (gdb) print *l2
>> $5 = {start = 0x85d278d99c8b81d1, end = 0x63caa47a16f1492e,
>> count = 1579875320, pushed = 1880390013, popped = 2415426777,
>> loaded = 3722138485, outputTime = {tv_sec = 144115210689264557,
>> tv_usec = 14930315638210660}, reportTime = {tv_sec =
>> -7084847654803835648,
>> tv_usec = -5817086215248780719}}
>>
>> argus/ArgusUtil.c
>> 246 void
>> 247 ArgusLoadList(struct ArgusListStruct *l1, struct
>> ArgusListStruct *l2)
>> 248 {
>> 249 if (l1 && l2) {
>> 250 int count;
>> 251 #if defined(ARGUS_THREADS)
>> 252 pthread_mutex_lock(&l1->lock);
>> 253 pthread_mutex_lock(&l2->lock);
>> 254 #endif
>> 255 count = l1->count;
>> 256
>> 257 if (l2->start == NULL)
>> 258 l2->start = l1->start;
>> 259 else
>> 260 l2->end->nxt = l1->start;
>> 261
>> 262 l2->end = l1->end;
>> 263 l2->count += count;
>> 264
>> 265 l1->start = NULL;
>> 266 l1->end = NULL;
>> 267 l1->loaded += count;
>> 268 l1->count = 0;
>> 269
>> 270 #if defined(ARGUS_THREADS)
>> 271 pthread_mutex_unlock(&l2->lock);
>> 272 pthread_mutex_unlock(&l1->lock);
>> 273 #endif
>> 274
>> 275 #ifdef ARGUSDEBUG
>> 276 ArgusDebug (5, "ArgusLoadList (0x%x, 0x%x) load %d objects
>> \n", l1, l2 , count);
>> 277 #endif
>> 278 }
>> 279 }
>>
>>
>> Gunnar Lindberg
>>
>>
>>> From SRS0=BzD3OK=BH=qosient.com=carter at srs.bis.na.blackberry.com
>>> Mon May 11 13:18:08 2009
>>> Message-ID: <2044323243-1242040666-cardhu_decombobulator_blackberry.rim.net-2042564372- at bxe1165.bisx.prod.on.blackberry
>>> >
>>> Reply-To: carter at qosient.com
>>> References: <E5F8710F-522D-4579-8569-A9DD5E130A06 at qosient.com><200905110551.n4B5pV62007936 at grunert.cdg.chalmers.se
>>> >
>>> In-Reply-To: <200905110551.n4B5pV62007936 at grunert.cdg.chalmers.se>
>>> Subject: Re: [ARGUS] segfault at 000000000311c000 rip
>>> 000000000040fb46rsp 0000007fbffff830 error 4
>>> To: "Gunnar Lindberg" <Gunnar.Lindberg at chalmers.se>,
>>> argus-info-bounces+carter=qosient.com at lists.andrew.cmu.edu,
>>> "Argus" <argus-info at lists.andrew.cmu.edu>
>>> From: carter at qosient.com
>>> Date: Mon, 11 May 2009 11:19:44 +0000
>>
>>> Hey Gunnar,
>>> The C level debugging in gdb() is very good, and gives you quick
>>> access to the symbols and stack info.
>>>
>>> I have never seen problems with ArgusLoadList(), so if you have a
>>> core file, if you could load it into gdb() and type:
>>>
>>> (gdb) where
>>> (gdb) print *l1. (assuming its in AtgusLoadList)
>>> (gdb) print *l2
>>>
>>> If not, if you could run it under gdb() until it stops, and type
>>> the same, that would give me a good start.
>>>
>>> Carter
>>>
>>> Sent from my Verizon Wireless BlackBerry
>>>
>>> -----Original Message-----
>>> From: Gunnar Lindberg <Gunnar.Lindberg at chalmers.se>
>>>
>>> Date: Mon, 11 May 2009 07:51:31
>>> To: <argus-info at lists.andrew.cmu.edu>
>>> Subject: Re: [ARGUS] segfault at 000000000311c000 rip
>>> 000000000040fb46
>>> rsp 0000007fbffff830 error 4
>>>
>>>
>>> No .threads in argus-3.0.1.beta.3
>>>
>>> My gdb knowledge is limited but I've done quite some amount of
>>> C/machine code debugging in my early days (25 years ago and MC68000
>>> I'd probably been able to write the C code from the optimized
>>> assembler :-). But, this is *86 - "same, same, but different"...
>>>
>>> Based on that I did the "disass" trick and <<<=== indicates the
>>> machine code where the crash occured. What beats me on *86 is
>>> which register is used for which C variable, but there seems to
>>> have been an offset "0x8(%rsi),%r9" involved just before - that
>>> was variables in a C struct on MC68000 and I guess it still is.
>>>
>>> So we picked up something 8 bytes into a C struct and than tried
>>> to us it as a pointer "%r10,(%r9)" - and pooof.
>>>
>>> The most probable thing is that data/pointers got screwed up minutes
>>> ago and then the bomb goes off now because we just got to that data.
>>> However, before going through the linked list of data I'd like to ask
>>> about a line of C code:
>>>
>>> argus/ArgusUtil.c:
>>>
>>> void
>>> ArgusLoadList(struct ArgusListStruct *l1, struct ArgusListStruct *l2)
>>> {
>>> ...
>>> if (l2->start == NULL)
>>> l2->start = l1->start;
>>> else
>>> l2->end->nxt = l1->start; <=
>>> ...
>>> }
>>>
>>> The only "nxt" I find is within a "struct ArgusListRecord",
>>> but "l2" and "l2->end" points at a "struct ArgusListStruct".
>>> Could this be it?
>>>
>>> Or is there some condition where l2->end is not correctly set?
>>>
>>> Gunnar Lindberg
>>>
>>> May 7 16:33:30 argv kernel: argus[14369] general protection
>>> rip:410bc2 rsp:7fbffff308 error:0
>>>
>>> gdb argus.14369 /core.14369
>>> ...
>>> #0 0x0000000000410bc2 in ArgusLoadList ()
>>> (gdb) where
>>> #0 0x0000000000410bc2 in ArgusLoadList ()
>>> #1 0x000000000041557b in ArgusOutputProcess ()
>>> #2 0x000000000040bb6c in ArgusProcessPacket ()
>>> #3 0x000000000040d006 in ArgusEtherPacket ()
>>> #4 0x00000034e2f04bff in ?? () from /usr/lib64/libpcap.so.0.8.3
>>> #5 0x0000000000410759 in ArgusGetPackets ()
>>> #6 0x0000000000404f83 in main ()
>>> (gdb) disass 0x0000000000410bc2
>>> Dump of assembler code for function ArgusLoadList:
>>> 0x0000000000410ba0 <ArgusLoadList+0>: test %rdi,%rdi
>>> 0x0000000000410ba3 <ArgusLoadList+3>: setne %dl
>>> 0x0000000000410ba6 <ArgusLoadList+6>: xor %eax,%eax
>>> 0x0000000000410ba8 <ArgusLoadList+8>: test %rsi,%rsi
>>> 0x0000000000410bab <ArgusLoadList+11>: setne %al
>>> 0x0000000000410bae <ArgusLoadList+14>: test %eax,%edx
>>> 0x0000000000410bb0 <ArgusLoadList+16>: je 0x410be9
>>> <ArgusLoadList+73>
>>> 0x0000000000410bb2 <ArgusLoadList+18>: cmpq $0x0,(%rsi)
>>> 0x0000000000410bb6 <ArgusLoadList+22>: mov 0x10(%rdi),%ecx
>>> 0x0000000000410bb9 <ArgusLoadList+25>: je 0x410bf0
>>> <ArgusLoadList+80>
>>> 0x0000000000410bbb <ArgusLoadList+27>: mov 0x8(%rsi),%r9
>>> 0x0000000000410bbf <ArgusLoadList+31>: mov (%rdi),%r10
>>> 0x0000000000410bc2 <ArgusLoadList+34>: mov %r10,(%r9)
>>> <<<===
>>> 0x0000000000410bc5 <ArgusLoadList+37>: mov 0x8(%rdi),%r11
>>> 0x0000000000410bc9 <ArgusLoadList+41>: add %ecx,0x1c(%rdi)
>>> 0x0000000000410bcc <ArgusLoadList+44>: add %ecx,0x10(%rsi)
>>> 0x0000000000410bcf <ArgusLoadList+47>: movq $0x0,(%rdi)
>>> 0x0000000000410bd6 <ArgusLoadList+54>: movl $0x0,0x10(%rdi)
>>> 0x0000000000410bdd <ArgusLoadList+61>: mov %r11,0x8(%rsi)
>>> 0x0000000000410be1 <ArgusLoadList+65>: movq $0x0,0x8(%rdi)
>>> 0x0000000000410be9 <ArgusLoadList+73>: repz retq
>>> 0x0000000000410beb <ArgusLoadList+75>: data16
>>> 0x0000000000410bec <ArgusLoadList+76>: data16
>>> 0x0000000000410bed <ArgusLoadList+77>: nop
>>> 0x0000000000410bee <ArgusLoadList+78>: data16
>>> 0x0000000000410bef <ArgusLoadList+79>: nop
>>> 0x0000000000410bf0 <ArgusLoadList+80>: mov (%rdi),%r8
>>> 0x0000000000410bf3 <ArgusLoadList+83>: mov %r8,(%rsi)
>>> 0x0000000000410bf6 <ArgusLoadList+86>: jmp 0x410bc5
>>> <ArgusLoadList+37>
>>> 0x0000000000410bf8 <ArgusLoadList+88>: data16
>>> 0x0000000000410bf9 <ArgusLoadList+89>: data16
>>> 0x0000000000410bfa <ArgusLoadList+90>: data16
>>> 0x0000000000410bfb <ArgusLoadList+91>: nop
>>> 0x0000000000410bfc <ArgusLoadList+92>: data16
>>> 0x0000000000410bfd <ArgusLoadList+93>: data16
>>> 0x0000000000410bfe <ArgusLoadList+94>: data16
>>> 0x0000000000410bff <ArgusLoadList+95>: nop
>>> End of assembler dump.
>>> (gdb) info registers
>>> rax 0x1 1
>>> rbx 0x174f450 24441936
>>> rcx 0x24d 589
>>> rdx 0x4a02f101 1241706753
>>> rsi 0x6540a0 6635680
>>> rdi 0x651460 6624352
>>> rbp 0x6516c0 0x6516c0
>>> rsp 0x7fbffff308 0x7fbffff308
>>> r8 0x69c6d 433261
>>> r9 0x63caa47a16f1492e 7190740599328295214
>>> r10 0x18c21f0 25960944
>>> r11 0x41a1320 68817696
>>> r12 0x3 3
>>> r13 0x651738 6625080
>>> r14 0x0 0
>>> r15 0x7fbffff510 548682069264
>>> rip 0x410bc2 0x410bc2 <ArgusLoadList+34>
>>> eflags 0x10286 66182
>>> cs 0x33 51
>>> ss 0x2b 43
>>> ds 0x0 0
>>> es 0x0 0
>>> fs 0x0 0
>>> gs 0x0 0
>>>
>>>
>>>
>>>> From carter at qosient.com Thu May 7 19:00:53 2009
>>>> Cc: argus-info at lists.andrew.cmu.edu
>>>> Message-Id: <E5F8710F-522D-4579-8569-A9DD5E130A06 at qosient.com>
>>>> From: Carter Bullard <carter at qosient.com>
>>>> To: Gunnar Lindberg <Gunnar.Lindberg at chalmers.se>
>>>> In-Reply-To: <200905071507.n47F7xeB026201 at grunert.cdg.chalmers.se>
>>>> Subject: Re: [ARGUS] segfault at 000000000311c000 rip
>>>> 000000000040fb46 rsp 0000007fbffff830 error 4
>>>> Date: Thu, 7 May 2009 13:00:42 -0400
>>>> References: <200905071507.n47F7xeB026201 at grunert.cdg.chalmers.se>
>>>
>>>> Hey Gunnar,
>>>> The gdb() commands of interest are:
>>>
>>>> (gdb) where
>>>
>>>> ArgusLoadList() is the routine that passes flow record status
>>>> reports
>>>> from the
>>>> packet processing engine to the output processor. This definitely
>>>> shouldn't
>>>> have a problem, so it will be interesting to figure out what the
>>>> problem maybe.
>>>
>>>> Are you running with threads enabled? (is there a ./.threads file
>>>> in
>>>> your root directory?)
>>>
>>>> Carter
>>>
>>>
>>
More information about the argus
mailing list