[ARGUS] Running argus under a watchdog

Peter Van Epp vanepp at sfu.ca
Sat Apr 17 17:57:18 EDT 2004


On Sat, Apr 17, 2004 at 02:33:25AM -0500, eric wrote:
> Is anyone running argus under a watchdog? 
> 
> We've been migrating to a collector/log server model, and sometimes
> the collector dies (as per those nasty core dumps I've referenced
> earlier this week). I tried running argus under Dan Bernstein's
> daemontools, but there's some issues of the daemon error'ing out,
> but svscan not realizing there's a program error. In other words,
> the process stays up, so svscan things all is well and does not
> restart the process.
> 
> Thanks. 
> 
> - Eric

	So the core dump has remained with argus_bpf running on the collector
(i.e. the data structure gets corrupted even when writing to a socket)? You
might try reducing the parameters you increased by a bit (or a lot :-)) and 
see if it is perhaps something that was boosted that is overwriting the 
file parameter space. It would probably also be interesting to dump the 
structure Carter asked about before there is a problem so you know what it 
should look like. From what I saw it looked like perhaps the entire data
structure had been 0ed from something like an over length memclear.
	For your restart problem, when the process dies do one or more of the 
argus tasks also die (I'd expect so, I've seen the two slave tasks die leaving 
22781 below running when I manage to do something stupid in perl that runs the 
machine out of memory, this being from my 2.0.6 test machine :-))? Assuming 
thats true a perl script that looks for the 3 tasks and if they aren't all 
there (and probably eating CPU as well) kills all that remain and restarts
argus running out of cron should do what you need. A similar script is probably
needed on the collector machine doing the same for the ra task that should be
collecting the data (it could also check for argus.out growing and get 
concerned if it isn't :-)). You'll notice we are still slow enough that I can
write to disk on the collector machine, although I intend on moving to the 2
machine model anyway before putting 2.x in to production.

%ps auxw | grep argus
root   22781  3.2  5.8 46608 45300  ??  Ss    5Apr04 668:58.91 /usr/local/bin/argus_bpf -dJR -i xl1 -w /data/argus.out
root   22783  1.2  0.1  2356  928  ??  S     5Apr04 370:25.04 /usr/local/bin/argus_bpf -dJR -i xl1 -w /data/argus.out
root   22782  0.0  0.1  2284  860  ??  S     5Apr04 103:18.47 /usr/local/bin/argus_bpf -dJR -i xl1 -w /data/argus.out

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada



More information about the argus mailing list