segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Carter Bullard carter at qosient.com
Tue Jun 30 19:49:27 EDT 2009


Hey Peter,
Something is writing over something, just can't seem to find a handle.
The ArgusLoadList() is passing ArgusListRecords from the Modeler to
the Output processor, and it just takes the two link lists and combines
them.  If there is nothing in the receive list, its just a "move the  
pointers"
and there you go.   The receive list should be empty if the output
processor is keeping ahead of the load.

My guess is that we're getting the length of an output record wrong,
which can happen if you're sloppy forming a DSR that you rarely use,
so it could be a packet specific bug still, or we are using a buffer  
that
has been deallocated/reallocated and  we're stomping on the new
users buffer.

This can happen in threaded applications, so turning off the .threads
tag may be a good test.

Hey Gunnar, any chance you can use valgrind() to see if we're doing
something wrong with memory?

Carter



On Jun 30, 2009, at 6:33 PM, Peter Van Epp wrote:

> On Mon, Jun 29, 2009 at 09:19:57AM +0200, Gunnar Lindberg wrote:
>> If you had asked me a week ago everything whould have been just fine.
>> No crash sine Jun 1. Our students left at the end of May which
>> probably changed traffic pattern quite considerably.
>>
>> However, a few days ago both our collector machines' Argus crashed,
>> in what you would call "stable and well tested routines" (like in
>> libc, so I do agree :-). I've just started the task of figuring out
>> what might have happened earlier, to make them go wrong.
>>
>> Since I've changed the code, line numbers are not from any of the
>> orignal versions, i.e. don't trust them.
>>
>> Finally, the argv# ArgusLoadList()->syslog()->etc stuff is actually
>> my code (but I'm afraid I don't think it crashed due to that). As
>> you may recall I was suspicious about part of the original code so
>> I added some syslog() calls  - I was wrong, but the code I added
>> actually tells that the "269 else" part is almost never used.
>>
>>    247 int l_ArgusLoadList;	/* loop, don't syslog() always */
>>
>>
>>    250 ArgusLoadList(struct ArgusListStruct *l1, struct  
>> ArgusListStruct *l2)
>>    251 {
>>    252    if (l1 && l2) {
>>    253       int count;
>>    254 #if defined(ARGUS_THREADS)
>>    255       pthread_mutex_lock(&l1->lock);
>>    256       pthread_mutex_lock(&l2->lock);
>>    257 #endif
>>    258       count = l1->count;
>>    259
>>    260       if (l2->start == NULL)
>>    261       {
>>    262         if (l_ArgusLoadList == 0)
>>    263         {
>>    264          syslog(LOG_INFO,"ArgusLoadList %d  
>> EQ",l_ArgusLoadList);
>>    265          l_ArgusLoadList++;
>>    266         }
>>    267         l2->start = l1->start;
>>    268       }
>>    269       else
>>    270       {
>>    271         if (l_ArgusLoadList <= 2)
>>    272         {
>>    273          syslog(LOG_INFO,"ArgusLoadList %d  
>> NE",l_ArgusLoadList);
>>    274          l_ArgusLoadList++;
>>    275         }
>>    276         l2->end->nxt = l1->start;
>>    277       }
>>
>> What we get is "ArgusLoadList 0 EQ" in syslog" once, but the "NE"
>> text never appears. Now we were on our way to syslog such an event,
>> but meanwhile we've been able to write into some of the internals
>> of syslog() so  we crash. My 0.01c.
>>
>> (gdb) print *l2
>> $1 = {start = 0x2029706f742e656c, end = 0x702e73696874202b,
>>  count = 1852142177, pushed = 1986610292, popped = 1332768596,
>>  loaded = 1702061670, outputTime = {tv_sec = 8463501140188347252,
>>    tv_usec = 5647881665291251314}, reportTime = {
>>    tv_sec = 2319389263590420008, tv_usec = 8721921111256604730}}
>>
>> (gdb) x/b 0x2029706f742e656c
>> 0x2029706f742e656c: Cannot access memory at address  
>> 0x2029706f742e656c
>>
>
> 	This looks to be an ascii string that has been used as an address
> (bad thing to be doing :-)):
>
> " )pot.elp.sift +"  when you combine the contents of the two  
> pointers and
> convert from hex to ascii. Unfortunatly it doesn't look familiar  
> (although it
> may to you we can hope) but it may be profitable to search for that  
> string in
> the incoming packets as it may point to which packet caused the  
> error (or of
> course it may just be the contents of some random memory location  
> that happens
> to contain somthing that looks like a string :-)).
>
> Peter Van Epp
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090630/cf24d82c/attachment.bin>


More information about the argus mailing list