rasplit crash

Jesse Bowling jessebowling at gmail.com
Wed Mar 26 20:31:01 EDT 2014


Happened again (seems to be happening more often, after running just fine
for weeks)...This time there was an actual segfault:

Mar 26 18:43:19 host kernel: rasplit[7644]: segfault at 7f582e3d1ff8 ip
00007f582d54a2f5 sp 00007fff83b35128 error 6 in libc-2.12.so
[7f582d4c0000+18b000]

but not the previous "EAGAIN" message.

Perhaps a better clue? Or now it's all over the place? :)

Cheers,

Jesse


On Wed, Mar 26, 2014 at 8:35 AM, Jesse Bowling <jessebowling at gmail.com>wrote:

> Here's the rasplit invocation I use:
>
> /usr/local/bin/rasplit -M time 5m -S 127.0.0.1:561 -w
> /nsm/argus/data/\$srcid/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S -d
>
> There are only a few changes in recent time that I can remember for this
> host:
>
> * Upgraded argus and argus-clients to the latest (3.0.7.5 and 3.0.7.19)
> from 3.0.7.2 and 3.0.7.9
> * Changed the rasplit from "-S publicIP" to "-S 127.0.0.1"
> * Added the "-d" flag (previously just using &)
>
> Thinking that perhaps something would jump out at you from the startup, I
> recompiled with .debug and captured the following...If there's anything
> else I can do to help please let me know...
>
> Cheers,
>
> Jesse
>
> # /usr/local/bin/rasplit -D 8 -M time 5m -S 127.0.0.1:561 -w /tmp/a
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494934 ArgusCalloc (1,
> 16) returning 0x1e17450
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494967 ArgusAddModeList
> (time) returning 1
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494976 ArgusCalloc (1,
> 16) returning 0x1e17410
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.494984 ArgusAddModeList
> (5m) returning 1
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495057 ArgusCalloc (1,
> 461728) returning 0x5df8a010
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495069 ArgusAddHostList
> (0x5dffb010, 127.0.0.1:561, 1, 6) returning 1
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495082 ArgusCalloc (1,
> 144) returning 0x1e167b0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495092 ArgusNewList ()
> returning 0x1e167b0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495101 ArgusCalloc (1,
> 296) returning 0x1e17260
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495108
> ArgusPushFrontList (0x1e167b0, 0x1e17260, 1) returning 0xc685
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495147 ArgusCalloc (1,
> 296) returning 0x1e17620
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495155
> ArgusPushFrontList (0x1e167b0, 0x1e17620, 1) returning 0xc685
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495163 ArgusClientInit()
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495170 main: reading
> files completed
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495177 ArgusCalloc (1,
> 72) returning 0x1e16870
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495184 ArgusNewQueue ()
> returning 0x1e16870
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495211 Trying 127.0.0.1
> port 561 Expecting Argus records
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495287 connected
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.495295
> ArgusGetServerSocket (0x7f8a5df8a010) returning 3
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.508947
> ArgusReadConnection() read 16 bytes
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.508988 ArgusCalloc (1,
> 4194304) returning 0x5cbef010
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509001 ArgusCalloc (1,
> 262144) returning 0x5df49010
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509362
> ArgusParseInit(0x7f8a5dffb010 0x7f8a5df8a010
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509376
> ArgusWriteConnection: write(3, 0x21b372a0, 7)
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509401
> ArgusWriteConnection(0x5df8a010, 0x21b372a0, 7) returning 7
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509410
> ArgusReadConnection(0x5df8a010, 2) returning 1
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509435 RaProcessRecord
> (0x5df8a630) done
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509442 RaScheduleRecord
> (0x7f8a5dffb010, 0x7f8a5df8a630) scheduled
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509451 ArgusHandleDatum
> (0x7f8a5df8a228, 0x7f8a5e07c7a8) returning 128
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509463 ArgusFree
> (0x1e16870)
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509470 ArgusDeleteQueue
> (0x1e16870) returning
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.509479
> ArgusReadStream(0x7f8a5dffb010) starting
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553196
> ArgusReadStreamSocket (0x7f8a5df8a010) read 228 bytes
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553265 ArgusCalloc (1,
> 384) returning 0x1e17750
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553282 ArgusCalloc (1,
> 12) returning 0x1e173f0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553292 ArgusCalloc (1,
> 80) returning 0x1e16870
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553302 ArgusCalloc (1,
> 36) returning 0x1e178e0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553312 ArgusCalloc (1,
> 52) returning 0x1e17910
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553322 ArgusCalloc (1,
> 80) returning 0x1e17950
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553330 ArgusCalloc (1,
> 120) returning 0x1e179b0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553338 ArgusCalloc (1,
> 20) returning 0x1e17a30
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553345 ArgusCalloc (1,
> 28) returning 0x1e17a50
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553352 ArgusCalloc (1,
> 12) returning 0x1e17a80
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553360 ArgusAlignRecord
> () returning 0x1e17750
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553386
> RaProcessSplitOptions(/tmp/a.2014.03.26.08.30.00, 4096, 0x1e17750): returns
> 0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553414
> ArgusInitNewFilename(0x5dffb010, 0x1e17620, /tmp/a.2014.03.26.08.30.00) done
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553437
> ArgusGenerateRecord (0x1e17750, 0) len 308
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553478
> ArgusWriteNewLogfile (/tmp/a.2014.03.26.08.30.00, 0x21b270d0) fwrite 308
> bytes
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553486
> ArgusWriteNewLogfile (/tmp/a.2014.03.26.08.30.00, 0x21b270d0) returning 0
> rasplit[50821.00370a5e8a7f0000]: 03/26/14 08:33:42.553494
> RaSendArgusRecord () returning 1
>
>
>
> On Wed, Mar 26, 2014 at 7:37 AM, Jesse Bowling <jessebowling at gmail.com>wrote:
>
>> Hi Carter,
>>
>> I'm connecting directly to argus, but I'll have to verify later whether
>> it's via the public address or the local host address...
>>
>> Cheers,
>>
>> Jesse
>>
>> On Mar 26, 2014, at 12:14 AM, Carter Bullard <carter at qosient.com> wrote:
>>
>> we'll use the reliable connection strategy if we're connecting to the
>> localhost, so that's not the problem.  just need to figure out how your
>> looping through the reliable connection logic if the connection is good.
>> are you connecting to radium or argus ???
>>
>> carter
>>
>> On Mar 25, 2014, at 11:34 PM, Jesse Bowling <jessebowling at gmail.com>
>> wrote:
>>
>> In this case I'm actually using rasplit to connect to a local instance,
>> so it seems unlikely...Is this a less-than-optimal configuration? My
>> thought process was that using an rasplit process to write files locally
>> would use fewer resources on the argus side than having argus write files
>> directly.
>>
>> If the setup is ok, then perhaps the error message is a red herring and
>> we need more debug information. If so, what level of debug might provide
>> the best information?
>>
>> Cheers,
>>
>> Jesse
>>
>>
>> On Tue, Mar 25, 2014 at 11:27 PM, Carter Bullard <carter at qosient.com>wrote:
>>
>>> Hey Jesse,
>>> So, if we believe the error messages, your spawning too many threads to
>>> try to reestablish a connection to one of your remote argus data sources.
>>> Does that sound plausible ??
>>>
>>> Carter
>>>
>>> On Mar 25, 2014, at 8:50 AM, Jesse Bowling <jessebowling at gmail.com>
>>> wrote:
>>>
>>> I'll bump this, as the issue occurred again last night. It would seem
>>> that whatever the problem is, it doesn't occur too often (10 days between
>>> runs), but it's certainly troubling. Again the error message was:
>>>
>>> Mar 25 03:38:45 host rasplit[16519]: 03/25/14 03:38:45.687977 main:
>>> pthread_create ArgusConnectRemotes: EAGAIN
>>>
>>> Any help on tracking this down? Do I need to run with debug for a while?
>>>
>>> Cheers,
>>>
>>> Jesse
>>>
>>>
>>> On Sat, Mar 15, 2014 at 7:19 PM, Jesse Bowling <jessebowling at gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I had an rasplit process crash on me, and the only indication in the
>>>> logs was:
>>>>
>>>> Mar 15 16:26:10 rasplit[2987]: 03/15/14 16:26:10.939859 main:
>>>> pthread_create ArgusConnectRemotes: EAGAIN
>>>>
>>>> Any hints on troubleshooting this?
>>>>
>>>> Thanks,
>>>>
>>>> Jesse
>>>>
>>>> --
>>>> Jesse Bowling
>>>>
>>>>
>>>
>>>
>>> --
>>> Jesse Bowling
>>>
>>>
>>>
>>
>>
>> --
>> Jesse Bowling
>>
>>
>
>
> --
> Jesse Bowling
>
>


-- 
Jesse Bowling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140326/27db342e/attachment.html>


More information about the argus mailing list