segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Gunnar Lindberg Gunnar.Lindberg at chalmers.se
Mon Jun 29 03:19:57 EDT 2009


If you had asked me a week ago everything whould have been just fine.
No crash sine Jun 1. Our students left at the end of May which
probably changed traffic pattern quite considerably.

However, a few days ago both our collector machines' Argus crashed,
in what you would call "stable and well tested routines" (like in
libc, so I do agree :-). I've just started the task of figuring out
what might have happened earlier, to make them go wrong.

Since I've changed the code, line numbers are not from any of the
orignal versions, i.e. don't trust them.

Finally, the argv# ArgusLoadList()->syslog()->etc stuff is actually
my code (but I'm afraid I don't think it crashed due to that). As
you may recall I was suspicious about part of the original code so
I added some syslog() calls  - I was wrong, but the code I added
actually tells that the "269 else" part is almost never used.

    247 int l_ArgusLoadList;	/* loop, don't syslog() always */


    250 ArgusLoadList(struct ArgusListStruct *l1, struct ArgusListStruct *l2)
    251 {
    252    if (l1 && l2) {
    253       int count;
    254 #if defined(ARGUS_THREADS)
    255       pthread_mutex_lock(&l1->lock);
    256       pthread_mutex_lock(&l2->lock);
    257 #endif
    258       count = l1->count;
    259 
    260       if (l2->start == NULL)
    261       {
    262         if (l_ArgusLoadList == 0)
    263         {
    264          syslog(LOG_INFO,"ArgusLoadList %d EQ",l_ArgusLoadList);
    265          l_ArgusLoadList++;
    266         }
    267         l2->start = l1->start;
    268       }
    269       else
    270       {
    271         if (l_ArgusLoadList <= 2)
    272         {
    273          syslog(LOG_INFO,"ArgusLoadList %d NE",l_ArgusLoadList);
    274          l_ArgusLoadList++;
    275         }
    276         l2->end->nxt = l1->start;
    277       }

What we get is "ArgusLoadList 0 EQ" in syslog" once, but the "NE"
text never appears. Now we were on our way to syslog such an event,
but meanwhile we've been able to write into some of the internals
of syslog() so  we crash. My 0.01c.

(gdb) print *l2
$1 = {start = 0x2029706f742e656c, end = 0x702e73696874202b, 
  count = 1852142177, pushed = 1986610292, popped = 1332768596, 
  loaded = 1702061670, outputTime = {tv_sec = 8463501140188347252, 
    tv_usec = 5647881665291251314}, reportTime = {
    tv_sec = 2319389263590420008, tv_usec = 8721921111256604730}}

(gdb) x/b 0x2029706f742e656c
0x2029706f742e656c: Cannot access memory at address 0x2029706f742e656c

Ths crash itself has nothing to do with that unreasonable pointer,
I think, but it might be worth to ask "what made 'l2->start' point
far outside our memory?"

	Gunnar Lindberg

argv#
-rwxrwxr-x   1 root       829739 Jun  1 07:47 argus
-rw-r--r--   1 root     60231680 Jun 24 14:50 core.879
general protection rip:34e22705f2 rsp:7fbfffe6d8 error:0

argc#
-rwxrwxr-x   1 root       829739 Jun  1 07:47 argus
-rw-r--r--   1 root     72712192 Jun 25 15:55 core.1303
general protection rip:3fabc696bd rsp:7fbfffe830 error:0

argv# gdb argus core.879
(gdb) bt
#0  0x00000034e22705f2 in strcmp () from /lib64/tls/libc.so.6
#1  0x00000034e2281d50 in __tzstring () from /lib64/tls/libc.so.6
#2  0x00000034e2283b43 in __tzfile_compute () from /lib64/tls/libc.so.6
#3  0x00000034e2282c8b in __tz_convert () from /lib64/tls/libc.so.6
#4  0x00000034e22c5abe in vsyslog () from /lib64/tls/libc.so.6
#5  0x00000034e22c6066 in syslog () from /lib64/tls/libc.so.6
#6  0x000000000041569c in ArgusLoadList (l1=0x659460, l2=0x65c0a0)
    at ArgusUtil.c:273
#7  0x000000000041a439 in ArgusOutputProcess (arg=0x6596c0)
    at ArgusOutput.c:477
#8  0x0000000000408339 in ArgusProcessPacket (src=0x2a95786010, p=0x65bb12 "", 
    length=1514, tvp=0x7fbfffec50, type=0) at ArgusModeler.c:1324
#9  0x00000000004107db in ArgusEtherPacket (user=0x2a95786010 "", 
    h=0x7fbfffecd0, p=0x65bb12 "") at ArgusSource.c:716
#10 0x00000034e2f04bff in ?? () from /usr/lib64/libpcap.so.0.8.3
#11 0x0000000000413cd2 in ArgusGetPackets (src=0x2a95786010)
    at ArgusSource.c:2093
#12 0x0000000000404c77 in main (argc=1, argv=0x7fbffff5a8) at argus.c:535

argc# gdb argus core.1303
gdb) bt
#0  0x0000003fabc696bd in _int_malloc () from /lib64/tls/libc.so.6
#1  0x0000003fabc6b420 in calloc () from /lib64/tls/libc.so.6
#2  0x0000000000423994 in ArgusCalloc (nitems=5, bytes=4) at argus_util.c:1385
#3  0x00000000004202a2 in ArgusUpdateAppState (model=0x659010, 
    flowstr=0x25d0810, state=16 '\020') at ArgusApp.c:278
#4  0x000000000040b00f in ArgusUpdateState (model=0x659010, flowstr=0x25d0810, 
    state=16 '\020') at ArgusModeler.c:2443
#5  0x000000000040a0ae in ArgusUpdateFlow (model=0x659010, flow=0x25d0810, 
    state=16 '\020') at ArgusModeler.c:2068
#6  0x0000000000408317 in ArgusProcessPacket (src=0x2a95786010, p=0x65be02 "", 
    length=104, tvp=0x7fbfffec50, type=0) at ArgusModeler.c:1316
#7  0x00000000004107db in ArgusEtherPacket (user=0x2a95786010 "", 
    h=0x7fbfffecd0, p=0x65be02 "") at ArgusSource.c:716
#8  0x0000003fac904bff in ?? () from /usr/lib64/libpcap.so.0.8.3
#9  0x0000000000413cd2 in ArgusGetPackets (src=0x2a95786010)
    at ArgusSource.c:2099
#10 0x0000000000404c77 in main (argc=1, argv=0x7fbffff5a8) at argus.c:535


>From carter at qosient.com Fri Jun 26 21:35:37 2009
>From: Carter Bullard <carter at qosient.com>
>To: Gunnar Lindberg <gunnar.lindberg at chalmers.se>
>CC: "argus-info at lists.andrew.cmu.edu" <argus-info at lists.andrew.cmu.edu>
>Date: Fri, 26 Jun 2009 21:35:22 +0200
>Subject: Re: [ARGUS] segfault at 000000000311c000 rip 000000000040fb46rsp
>	0000007fbffff830 error 4
>Message-ID: <78C956B9-F7C0-4E75-A37B-843A293386FF at qosient.com>
>References: <200905301037.n4UAbUDs013514 at grunert.cdg.chalmers.se>
>In-Reply-To: <200905301037.n4UAbUDs013514 at grunert.cdg.chalmers.se>

>Hey Gunnar,
>Any new update on your problem?
>Carter

>On May 30, 2009, at 6:37 AM, Gunnar Lindberg wrote:

>> Great, thanks. First thing monday.
>>
>> Then, it's quite infrequent. Less than once a week by now. Which is
>> partly why my 0.0c is for strange packet data (I do part time IRT
>> work and expect packets with every illegal combination of flag bits).
>>
>> I have another idea to possibly catch the current/last packet, if
>> we encounter similar crashes again (most/all packets will pass via
>> ArgusGetPackets() so that should hold at most times). My plan is
>> to move these declarations to the beginning of ArgusGetPackets(),
>> i.e. have gdb be able to print the last packet data. Of course the
>> buffer could also be damaged, but maybe enough is left to deduce
>> what it was.
>>
>> Is there any reason not to do this (there seems to be two occurences
>> of these and I wll simply comment them both)?
>>
>>   2126 /* libpcap workaround */
>>   2127                      struct pcap_pkthdr *header;
>>   2128                      const u_char *pkt_data;
>>
>> 	Gunnar Lindberg



More information about the argus mailing list