[ARGUS] ra stops unexpectedly

Peter Van Epp vanepp at sfu.ca
Fri Oct 1 11:53:01 EDT 2004


	While I don't know how easy it would be, what I'd like to see is 
(likely as a selectable option) reliable service from argus_bpf to a remote 
ra, and the suggestion of a heartbeat from argus indicating it thinks it is 
working. I finally converted to 2.0.6 across 3 machines (sensor -> ra on an
archive host on a crossover cable to avoid network outage issues) -> in to ssh 
to a remote post processing / archive host). 
	As Carter suggests there are a bunch of perls scripts running
from cron on all three machines looking to see that something reasonable seems
to be happening and prepared to kill and restart argus, ra, or the transfer
processes as required. While I haven't seen the ra terminating due to no MAR
records that Mike has nor so far had a ra crash that I didn't intentionally 
cause, should ra die, and argus_bpf not, I will lose data that I don't have to 
(if argus_bpf dies, unless I have a tcpdump like function in front of it which 
can buffer the input packets for a while while argus gets restarted by the 
watchdog process which is somewhat tempting :-)) I'm SOL. So, what would be 
desirable is for argus to bufffer in memory at least a few transactions if a 
"continuous" (as opposed to ad hoc) ra connection dies. When the ra reconnects, argus flushes the buffer to the new ra instance so no data is lost. This implies
that there is an ack mechanism of some kind between the ra instance and the
argus instance to keep flushing the buffer during normal operation. As a bonus
this same mechanism will let you insure that the data argus_bpf is capturing
is actually making it to ra (if the queue starts to get too big then you may
need to restart ra, or load balance as Eric suggested).
	It would also be good to have a daemon mode for ra (although all the
likely flags are taken already :-)) for this instance.
	I have discovered that I also need to have a watchdog process that is
capable of power cycling my Century tap in the (unlikely, because of UPS,)
event of a power failure. The Century tap failed to bring up link on one side
of the monitor connection until I power cyled the tap once.

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada



More information about the argus mailing list