segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4
Peter Van Epp
vanepp at sfu.ca
Tue Jun 30 18:33:42 EDT 2009
On Mon, Jun 29, 2009 at 09:19:57AM +0200, Gunnar Lindberg wrote:
> If you had asked me a week ago everything whould have been just fine.
> No crash sine Jun 1. Our students left at the end of May which
> probably changed traffic pattern quite considerably.
>
> However, a few days ago both our collector machines' Argus crashed,
> in what you would call "stable and well tested routines" (like in
> libc, so I do agree :-). I've just started the task of figuring out
> what might have happened earlier, to make them go wrong.
>
> Since I've changed the code, line numbers are not from any of the
> orignal versions, i.e. don't trust them.
>
> Finally, the argv# ArgusLoadList()->syslog()->etc stuff is actually
> my code (but I'm afraid I don't think it crashed due to that). As
> you may recall I was suspicious about part of the original code so
> I added some syslog() calls - I was wrong, but the code I added
> actually tells that the "269 else" part is almost never used.
>
> 247 int l_ArgusLoadList; /* loop, don't syslog() always */
>
>
> 250 ArgusLoadList(struct ArgusListStruct *l1, struct ArgusListStruct *l2)
> 251 {
> 252 if (l1 && l2) {
> 253 int count;
> 254 #if defined(ARGUS_THREADS)
> 255 pthread_mutex_lock(&l1->lock);
> 256 pthread_mutex_lock(&l2->lock);
> 257 #endif
> 258 count = l1->count;
> 259
> 260 if (l2->start == NULL)
> 261 {
> 262 if (l_ArgusLoadList == 0)
> 263 {
> 264 syslog(LOG_INFO,"ArgusLoadList %d EQ",l_ArgusLoadList);
> 265 l_ArgusLoadList++;
> 266 }
> 267 l2->start = l1->start;
> 268 }
> 269 else
> 270 {
> 271 if (l_ArgusLoadList <= 2)
> 272 {
> 273 syslog(LOG_INFO,"ArgusLoadList %d NE",l_ArgusLoadList);
> 274 l_ArgusLoadList++;
> 275 }
> 276 l2->end->nxt = l1->start;
> 277 }
>
> What we get is "ArgusLoadList 0 EQ" in syslog" once, but the "NE"
> text never appears. Now we were on our way to syslog such an event,
> but meanwhile we've been able to write into some of the internals
> of syslog() so we crash. My 0.01c.
>
> (gdb) print *l2
> $1 = {start = 0x2029706f742e656c, end = 0x702e73696874202b,
> count = 1852142177, pushed = 1986610292, popped = 1332768596,
> loaded = 1702061670, outputTime = {tv_sec = 8463501140188347252,
> tv_usec = 5647881665291251314}, reportTime = {
> tv_sec = 2319389263590420008, tv_usec = 8721921111256604730}}
>
> (gdb) x/b 0x2029706f742e656c
> 0x2029706f742e656c: Cannot access memory at address 0x2029706f742e656c
>
This looks to be an ascii string that has been used as an address
(bad thing to be doing :-)):
" )pot.elp.sift +" when you combine the contents of the two pointers and
convert from hex to ascii. Unfortunatly it doesn't look familiar (although it
may to you we can hope) but it may be profitable to search for that string in
the incoming packets as it may point to which packet caused the error (or of
course it may just be the contents of some random memory location that happens
to contain somthing that looks like a string :-)).
Peter Van Epp
More information about the argus
mailing list