rasplit crash

Carter Bullard carter at qosient.com
Thu Mar 27 07:40:51 EDT 2014


Hey Jesse,
Grab argus-clients-3.0.7.23.tar.gz, the latest and greatest. 
 If you get the EAGAIN message again, turn off reliable connections in your .rarc and see if it goes away...  I don't see anything obvious yet.

Carter

> On Mar 26, 2014, at 8:31 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
> 
> Happened again (seems to be happening more often, after running just fine for weeks)...This time there was an actual segfault:
> 
> Mar 26 18:43:19 host kernel: rasplit[7644]: segfault at 7f582e3d1ff8 ip 00007f582d54a2f5 sp 00007fff83b35128 error 6 in libc-2.12.so[7f582d4c0000+18b000]
> 
> but not the previous "EAGAIN" message.
> 
> Perhaps a better clue? Or now it's all over the place? :)
> 
> Cheers,
> 
> Jesse
> 
> 
>> On Wed, Mar 26, 2014 at 8:35 AM, Jesse Bowling <jessebowling at gmail.com> wrote:
>> Here's the rasplit invocation I use:
>> 
>> /usr/local/bin/rasplit -M time 5m -S 127.0.0.1:561 -w /nsm/argus/data/\$srcid/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S -d
>> 
>> There are only a few changes in recent time that I can remember for this host:
>> 
>> * Upgraded argus and argus-clients to the latest (3.0.7.5 and 3.0.7.19) from 3.0.7.2 and 3.0.7.9
>> * Changed the rasplit from "-S publicIP" to "-S 127.0.0.1"
>> * Added the "-d" flag (previously just using &)
>> 
>> Thinking that perhaps something would jump out at you from the startup, I recompiled with .debug and captured the following...If there's anything else I can do to help please let me know...
>> 
>> Cheers,
>> 
>> Jesse
>> 
>> # /usr/local/bin/rasplit -D 8 -M time 5m -S 127.0.0.1:561 -w /tmp/a
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494934 ArgusCalloc (1, 16) returning 0x1e17450
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494967 ArgusAddModeList (time) returning 1
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494976 ArgusCalloc (1, 16) returning 0x1e17410
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494984 ArgusAddModeList (5m) returning 1
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495057 ArgusCalloc (1, 461728) returning 0x5df8a010
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495069 ArgusAddHostList (0x5dffb010, 127.0.0.1:561, 1, 6) returning 1
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495082 ArgusCalloc (1, 144) returning 0x1e167b0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495092 ArgusNewList () returning 0x1e167b0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495101 ArgusCalloc (1, 296) returning 0x1e17260
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495108 ArgusPushFrontList (0x1e167b0, 0x1e17260, 1) returning 0xc685
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495147 ArgusCalloc (1, 296) returning 0x1e17620
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495155 ArgusPushFrontList (0x1e167b0, 0x1e17620, 1) returning 0xc685
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495163 ArgusClientInit()
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495170 main: reading files completed
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495177 ArgusCalloc (1, 72) returning 0x1e16870
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495184 ArgusNewQueue () returning 0x1e16870
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495211 Trying 127.0.0.1 port 561 Expecting Argus records
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495287 connected
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495295 ArgusGetServerSocket (0x7f8a5df8a010) returning 3
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.508947 ArgusReadConnection() read 16 bytes
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.508988 ArgusCalloc (1, 4194304) returning 0x5cbef010
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509001 ArgusCalloc (1, 262144) returning 0x5df49010
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509362 ArgusParseInit(0x7f8a5dffb010 0x7f8a5df8a010
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509376 ArgusWriteConnection: write(3, 0x21b372a0, 7)
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509401 ArgusWriteConnection(0x5df8a010, 0x21b372a0, 7) returning 7
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509410 ArgusReadConnection(0x5df8a010, 2) returning 1
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509435 RaProcessRecord (0x5df8a630) done
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509442 RaScheduleRecord (0x7f8a5dffb010, 0x7f8a5df8a630) scheduled
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509451 ArgusHandleDatum (0x7f8a5df8a228, 0x7f8a5e07c7a8) returning 128
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509463 ArgusFree (0x1e16870)
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509470 ArgusDeleteQueue (0x1e16870) returning
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509479 ArgusReadStream(0x7f8a5dffb010) starting
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553196 ArgusReadStreamSocket (0x7f8a5df8a010) read 228 bytes
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553265 ArgusCalloc (1, 384) returning 0x1e17750
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553282 ArgusCalloc (1, 12) returning 0x1e173f0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553292 ArgusCalloc (1, 80) returning 0x1e16870
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553302 ArgusCalloc (1, 36) returning 0x1e178e0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553312 ArgusCalloc (1, 52) returning 0x1e17910
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553322 ArgusCalloc (1, 80) returning 0x1e17950
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553330 ArgusCalloc (1, 120) returning 0x1e179b0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553338 ArgusCalloc (1, 20) returning 0x1e17a30
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553345 ArgusCalloc (1, 28) returning 0x1e17a50
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553352 ArgusCalloc (1, 12) returning 0x1e17a80
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553360 ArgusAlignRecord () returning 0x1e17750
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553386 RaProcessSplitOptions(/tmp/a.2014.03.26.08.30.00, 4096, 0x1e17750): returns 0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553414 ArgusInitNewFilename(0x5dffb010, 0x1e17620, /tmp/a.2014.03.26.08.30.00) done
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553437 ArgusGenerateRecord (0x1e17750, 0) len 308
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553478 ArgusWriteNewLogfile (/tmp/a.2014.03.26.08.30.00, 0x21b270d0) fwrite 308 bytes
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553486 ArgusWriteNewLogfile (/tmp/a.2014.03.26.08.30.00, 0x21b270d0) returning 0
>> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553494 RaSendArgusRecord () returning 1
>> 
>> 
>> 
>>> On Wed, Mar 26, 2014 at 7:37 AM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>> Hi Carter,
>>> 
>>> I'm connecting directly to argus, but I'll have to verify later whether it's via the public address or the local host address...
>>> 
>>> Cheers,
>>> 
>>> Jesse
>>> 
>>>> On Mar 26, 2014, at 12:14 AM, Carter Bullard <carter at qosient.com> wrote:
>>>> 
>>>> we'll use the reliable connection strategy if we're connecting to the localhost, so that's not the problem.  just need to figure out how your looping through the reliable connection logic if the connection is good.   are you connecting to radium or argus ???
>>>> 
>>>> carter
>>>> 
>>>>> On Mar 25, 2014, at 11:34 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>> 
>>>>> In this case I'm actually using rasplit to connect to a local instance, so it seems unlikely...Is this a less-than-optimal configuration? My thought process was that using an rasplit process to write files locally would use fewer resources on the argus side than having argus write files directly.
>>>>> 
>>>>> If the setup is ok, then perhaps the error message is a red herring and we need more debug information. If so, what level of debug might provide the best information?
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Jesse
>>>>> 
>>>>> 
>>>>>> On Tue, Mar 25, 2014 at 11:27 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>> Hey Jesse,
>>>>>> So, if we believe the error messages, your spawning too many threads to
>>>>>> try to reestablish a connection to one of your remote argus data sources.
>>>>>> Does that sound plausible ??
>>>>>> 
>>>>>> Carter
>>>>>> 
>>>>>>> On Mar 25, 2014, at 8:50 AM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>>>> 
>>>>>> 
>>>>>>> I'll bump this, as the issue occurred again last night. It would seem that whatever the problem is, it doesn't occur too often (10 days between runs), but it's certainly troubling. Again the error message was:
>>>>>>> 
>>>>>>> Mar 25 03:38:45 host rasplit[16519]: 03/25/14 03:38:45.687977 main: pthread_create ArgusConnectRemotes: EAGAIN 
>>>>>>> 
>>>>>>> Any help on tracking this down? Do I need to run with debug for a while?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Jesse
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sat, Mar 15, 2014 at 7:19 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I had an rasplit process crash on me, and the only indication in the logs was:
>>>>>>>> 
>>>>>>>> Mar 15 16:26:10 rasplit[2987]: 03/15/14 16:26:10.939859 main: pthread_create ArgusConnectRemotes: EAGAIN 
>>>>>>>> 
>>>>>>>> Any hints on troubleshooting this?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Jesse
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Jesse Bowling
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Jesse Bowling
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jesse Bowling
>> 
>> 
>> 
>> -- 
>> Jesse Bowling
> 
> 
> 
> -- 
> Jesse Bowling
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140327/b5a265ac/attachment.html>


More information about the argus mailing list