[ARGUS] ra stops unexpectedly
Peter Van Epp
vanepp at sfu.ca
Fri Oct 1 11:53:01 EDT 2004
While I don't know how easy it would be, what I'd like to see is
(likely as a selectable option) reliable service from argus_bpf to a remote
ra, and the suggestion of a heartbeat from argus indicating it thinks it is
working. I finally converted to 2.0.6 across 3 machines (sensor -> ra on an
archive host on a crossover cable to avoid network outage issues) -> in to ssh
to a remote post processing / archive host).
As Carter suggests there are a bunch of perls scripts running
from cron on all three machines looking to see that something reasonable seems
to be happening and prepared to kill and restart argus, ra, or the transfer
processes as required. While I haven't seen the ra terminating due to no MAR
records that Mike has nor so far had a ra crash that I didn't intentionally
cause, should ra die, and argus_bpf not, I will lose data that I don't have to
(if argus_bpf dies, unless I have a tcpdump like function in front of it which
can buffer the input packets for a while while argus gets restarted by the
watchdog process which is somewhat tempting :-)) I'm SOL. So, what would be
desirable is for argus to bufffer in memory at least a few transactions if a
"continuous" (as opposed to ad hoc) ra connection dies. When the ra reconnects, argus flushes the buffer to the new ra instance so no data is lost. This implies
that there is an ack mechanism of some kind between the ra instance and the
argus instance to keep flushing the buffer during normal operation. As a bonus
this same mechanism will let you insure that the data argus_bpf is capturing
is actually making it to ra (if the queue starts to get too big then you may
need to restart ra, or load balance as Eric suggested).
It would also be good to have a daemon mode for ra (although all the
likely flags are taken already :-)) for this instance.
I have discovered that I also need to have a watchdog process that is
capable of power cycling my Century tap in the (unlikely, because of UPS,)
event of a power failure. The Century tap failed to bring up link on one side
of the monitor connection until I power cyled the tap once.
Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada
More information about the argus
mailing list