segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Peter Van Epp vanepp at sfu.ca
Tue Jun 30 18:33:42 EDT 2009


On Mon, Jun 29, 2009 at 09:19:57AM +0200, Gunnar Lindberg wrote:
> If you had asked me a week ago everything whould have been just fine.
> No crash sine Jun 1. Our students left at the end of May which
> probably changed traffic pattern quite considerably.
> 
> However, a few days ago both our collector machines' Argus crashed,
> in what you would call "stable and well tested routines" (like in
> libc, so I do agree :-). I've just started the task of figuring out
> what might have happened earlier, to make them go wrong.
> 
> Since I've changed the code, line numbers are not from any of the
> orignal versions, i.e. don't trust them.
> 
> Finally, the argv# ArgusLoadList()->syslog()->etc stuff is actually
> my code (but I'm afraid I don't think it crashed due to that). As
> you may recall I was suspicious about part of the original code so
> I added some syslog() calls  - I was wrong, but the code I added
> actually tells that the "269 else" part is almost never used.
> 
>     247 int l_ArgusLoadList;	/* loop, don't syslog() always */
> 
> 
>     250 ArgusLoadList(struct ArgusListStruct *l1, struct ArgusListStruct *l2)
>     251 {
>     252    if (l1 && l2) {
>     253       int count;
>     254 #if defined(ARGUS_THREADS)
>     255       pthread_mutex_lock(&l1->lock);
>     256       pthread_mutex_lock(&l2->lock);
>     257 #endif
>     258       count = l1->count;
>     259 
>     260       if (l2->start == NULL)
>     261       {
>     262         if (l_ArgusLoadList == 0)
>     263         {
>     264          syslog(LOG_INFO,"ArgusLoadList %d EQ",l_ArgusLoadList);
>     265          l_ArgusLoadList++;
>     266         }
>     267         l2->start = l1->start;
>     268       }
>     269       else
>     270       {
>     271         if (l_ArgusLoadList <= 2)
>     272         {
>     273          syslog(LOG_INFO,"ArgusLoadList %d NE",l_ArgusLoadList);
>     274          l_ArgusLoadList++;
>     275         }
>     276         l2->end->nxt = l1->start;
>     277       }
> 
> What we get is "ArgusLoadList 0 EQ" in syslog" once, but the "NE"
> text never appears. Now we were on our way to syslog such an event,
> but meanwhile we've been able to write into some of the internals
> of syslog() so  we crash. My 0.01c.
> 
> (gdb) print *l2
> $1 = {start = 0x2029706f742e656c, end = 0x702e73696874202b, 
>   count = 1852142177, pushed = 1986610292, popped = 1332768596, 
>   loaded = 1702061670, outputTime = {tv_sec = 8463501140188347252, 
>     tv_usec = 5647881665291251314}, reportTime = {
>     tv_sec = 2319389263590420008, tv_usec = 8721921111256604730}}
> 
> (gdb) x/b 0x2029706f742e656c
> 0x2029706f742e656c: Cannot access memory at address 0x2029706f742e656c
> 

	This looks to be an ascii string that has been used as an address
(bad thing to be doing :-)):

 " )pot.elp.sift +"  when you combine the contents of the two pointers and
convert from hex to ascii. Unfortunatly it doesn't look familiar (although it
may to you we can hope) but it may be profitable to search for that string in 
the incoming packets as it may point to which packet caused the error (or of 
course it may just be the contents of some random memory location that happens 
to contain somthing that looks like a string :-)). 

Peter Van Epp



More information about the argus mailing list