[ARGUS] ra stops unexpectedly
eric
eric-list-argus at catastrophe.net
Thu Sep 30 14:07:44 EDT 2004
On Thu, 2004-09-30 at 13:56:22 -0400, Carter Bullard proclaimed...
> None of this means we can't provide a "reconnect on failure"
> feature, but what are we going to specify when you're connected
> to 3 remote data sources? How do we notify the specific client
> that a source has been lost, or has not ever been connected?
Hey Carter et al,
So, let's see. What about tracking each client from a parent,
master, process? I know this would be a real pain, and may lead to
some race conditions if you're not careful, but it might solve the
purpose. So let's say PID 4550 establishes connections to three
servers for PID's 4551,4552,4553. Essentially you can drop the child
processes into a privilege seperated jail (adding more security
too!) and only let them communicate back to the parent through very
specific calls. The children should be given all rights to write to
disk, etc., then drop privileges. If the parent notices one dies off
or becomes overloaded for X amount of time, send a SIGHUP or kill it
and restart it. Perhaps you can just look for bind problems to the
server?
What I've found is that it's more of a pain to actually find out
when we're losing data if we're still connected. So, that said....
...wouldn't it be great if the server summarized how many flow
records it's gather and reported that as a status (stop me if we're
already doing this) in the form of a sequence number?
So, "Hey collector A, it's sensor B, I've seen 54141 flows, I'll see
you again in 30 seconds!"
Then add that same functionality into the ra() tools and report
errors and oddities.
This would help scripting restarts of the clients, etc.
More information about the argus
mailing list