[ARGUS] Running argus under a watchdog
Peter Van Epp
vanepp at sfu.ca
Sat Apr 17 17:57:18 EDT 2004
On Sat, Apr 17, 2004 at 02:33:25AM -0500, eric wrote:
> Is anyone running argus under a watchdog?
>
> We've been migrating to a collector/log server model, and sometimes
> the collector dies (as per those nasty core dumps I've referenced
> earlier this week). I tried running argus under Dan Bernstein's
> daemontools, but there's some issues of the daemon error'ing out,
> but svscan not realizing there's a program error. In other words,
> the process stays up, so svscan things all is well and does not
> restart the process.
>
> Thanks.
>
> - Eric
So the core dump has remained with argus_bpf running on the collector
(i.e. the data structure gets corrupted even when writing to a socket)? You
might try reducing the parameters you increased by a bit (or a lot :-)) and
see if it is perhaps something that was boosted that is overwriting the
file parameter space. It would probably also be interesting to dump the
structure Carter asked about before there is a problem so you know what it
should look like. From what I saw it looked like perhaps the entire data
structure had been 0ed from something like an over length memclear.
For your restart problem, when the process dies do one or more of the
argus tasks also die (I'd expect so, I've seen the two slave tasks die leaving
22781 below running when I manage to do something stupid in perl that runs the
machine out of memory, this being from my 2.0.6 test machine :-))? Assuming
thats true a perl script that looks for the 3 tasks and if they aren't all
there (and probably eating CPU as well) kills all that remain and restarts
argus running out of cron should do what you need. A similar script is probably
needed on the collector machine doing the same for the ra task that should be
collecting the data (it could also check for argus.out growing and get
concerned if it isn't :-)). You'll notice we are still slow enough that I can
write to disk on the collector machine, although I intend on moving to the 2
machine model anyway before putting 2.x in to production.
%ps auxw | grep argus
root 22781 3.2 5.8 46608 45300 ?? Ss 5Apr04 668:58.91 /usr/local/bin/argus_bpf -dJR -i xl1 -w /data/argus.out
root 22783 1.2 0.1 2356 928 ?? S 5Apr04 370:25.04 /usr/local/bin/argus_bpf -dJR -i xl1 -w /data/argus.out
root 22782 0.0 0.1 2284 860 ?? S 5Apr04 103:18.47 /usr/local/bin/argus_bpf -dJR -i xl1 -w /data/argus.out
Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada
More information about the argus
mailing list