Rasplit and crashes

carter at qosient.com carter at qosient.com
Wed Jun 24 22:59:44 EDT 2009


Hey Eric,
I suspect that rasplit is crashing because of the reconnection failure, not because of a V2 record problem.  Rasplit could be more efficient writing records, and so I'll need to figure out the crash, and then I'll need to figure a way to improve the performance

I do need to know what version of rasplit and how you are calling it to so I try a few things on my side.
If by any chance you have a coredump that would be really helpful.  Kind of machine, OS is also needed.

Carter
Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: Eric Gustafson <subwire at gmail.com>

Date: Wed, 24 Jun 2009 15:41:04 
To: <carter at qosient.com>
Cc: <argus-info-bounces+carter=qosient.com at lists.andrew.cmu.edu>; Argus<argus-info at lists.andrew.cmu.edu>
Subject: Re: [ARGUS] Rasplit and crashes


Hey Carter,
I figured it was something to that effect.  We have two sensors, one under
heavy load, and one not, and only the heavy load one crashed.
This only started happening when we brought rasplit into the picture.
Previously, we were simply using argus' -w option to write out stuff to
files, and yanking those files every so often.  That was messy and
disorganized, however, and rasplit seemed to do the job in the most "kosher"
way possible.
And yes, we have an utterly ridiculous amount of load on the system, thus
our intent to move everything to Bivio :)
Dropping packets at the interface we can live with for now, as we have other
tools already running on Bivio such that if a few things fall off we're not
screwed, and we haven't not found flows we were looking for yet.
We do, however, have a bit of a problem when things hang silently and
nobody's around to kickstart it again, of course.
Maybe I'll suggest we go add a second disk on which to handle all
argus-related stuff in the hopes we're not fighting for I/O with anything
else.  We've got some lying around, and it'll hold us until the Bivio option
becomes available.
What's interesting is, despite the configuration option to reconnect after 5
seconds, it doesn't seem to work.  We can, however, get it to work by
sigkilling the process and starting a new one.  We wrote a handy cronjob to
accomplish this, but we still lose data in the time between rasplit crashing
and our script noticing its time to kill processes.  Again, would upgrading
Argus on the sensors to 3.0.1 help any along the lines of speed and
efficiency improvements?

Thanks again,
Eric

On Wed, Jun 24, 2009 at 3:22 PM, <carter at qosient.com> wrote:

> Hey Eric,
> Sorry for the late response.
> It looks like rasplit is getting data faster than it can write out to disk,
> and rather than throwing records away, its just closing the connection,
> retrying the connection, failing and then running into a bug.
>
> Is the load on the machine reflecting this kind of behavior? The bug is
> probably easy to fix, but dealing with the load is something we probably
> should work on.
>
> If the disk can't keep up, what would be right! Drop records?
>
> Carter
>
> Sent from my Verizon Wireless BlackBerry
>
> ------------------------------
> *From*: Eric Gustafson
> *Date*: Wed, 24 Jun 2009 09:31:36 -0700
> *To*: Argus<argus-info at lists.andrew.cmu.edu>
> *Subject*: [ARGUS] Rasplit and crashes
> Hey guys,
> I've got another problem unrelated to Bivio, unfortunately.  We have been
> trying to fully automate and organize our gathering of argus records using
> rasplit, but have been seeing it "crash" after a seemingly random period of
> time, where the process would just hang, giving only a minimal indication of
> its dilema.  We've tried this both running rasplit locally on each sensor,
> and running rasplit on the system with our storage array, and both seem to
> have the same effect.
>
> We don't have any clue why its doing this, given it isn't logging in very
> much detail.
> Here's what I see on one of the sensors when using rasplit to attach to a
> local argus:
> We start rasplit like this: /usr/local/bin/rasplit -d -S localhost -M time
> 1h -w /argus/%Y/%m/%d/argus.%Y.%m.%d.%H
>
> Here's it starting up:
> Jun 23 10:13:29 snort1 rasplit[20197]: 10:13:29.043344 started
> Jun 23 10:13:29 snort1 argus[3557]: connect from 127.0.0.1
> Here's it "crashing"
> Jun 23 23:48:36 snort1 argus[3557]: ArgusWriteOutSocket(0x8865b68) Queue
> Count 50001
> Jun 23 23:48:50 snort1 argus[20201]: ArgusHandleClientData:
> ArgusWriteOutSocket failed
> When running remotely and streaming records across the network, rasplit
> would log a "Connection refused" message before "crashing".
>
> The fun part is, we're running the latest argus-clients, but the argus
> sensors, which have been around for a long time are running argus 2.0.6.  My
> co-workers are taking the totally understandable "if it ain't broke, don't
> fix it" mentallity with these.  If this is likely the cause of the bug,
> however, upgrading shouldn't be a problem.
> The only other quirk to our setup is that we receive a huge (2.5gb+ of
> uncompressed argus records / hour) amount of data.
> What do you all think?
>
> Thanks,
> Eric
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090625/73837624/attachment.html>


More information about the argus mailing list