radium stops passing traffic
Carter Bullard
carter at qosient.com
Fri Sep 25 13:33:28 EDT 2009
Hey Jason,
There are a few reasons why radium could stop, most should leave a
trail in
a syslog file somewhere. The most likely is that radium() has
terminated the
connection to a client that isn't reading fast enough. Radium figures
this out
because its output queue gets too big, assumes the remote has gone or
is too
slow, and gives up. Generally, the client (which can be configured
to retry
the connection) gets a shutdown message, and then reattaches and all is
goodness again.
Of course there could be bugs anywhere in this logic. So when this
happens
there are a few quick questions.
Does radium have a connection to the client, but no data is being
passed? If this
is the case, we definitely have a bug and the best strategy is to
attach to the running
radium() with gdb() and step through to see what the problem is.
Is radium still reading records from the remote argi? If not radium
maybe fine, but
there isn't any data to transmit, or radium has lost its connections
from the argi, as
it isn't processing fast enough itself to keep up with the record load.
When radium isn't passing records, is it responding to additional
connection
requests? This would test if the radium output thread is completely
dead,
or spinning in a loop.
So check if radium() is living. use netstat -na to see if the remote
(s) still have
active connections. checkout the load to see if radium() is chewing
up 100% of
one of the processors (infinite loop), and of course, check to see if
there are any
syslog messages from radium() indicating if the queue limit is reached
or if
its disconnecting or whatever.
If this gets to be too much, lets see if I can logon and make some
sense of it?
Carter
On Sep 25, 2009, at 1:12 PM, Jason Carr wrote:
> Hi guys,
>
> Here's what's going on right now. We've got two argus processes
> running on our Bivio unit plus two radium processes. One radium
> process runs on the Bivio to essentially multiplex and forward to
> the external radium process that runs on our long term storage
> machine.
>
> What happens currently is that after a dynamic amount of time the
> radium running on Bivio stops passing the traffic to the external
> radium process. Killing radium and restarting it fixes the problem
> immediately. Running radium in debug mode (-D 999) yields a 6.5G
> output file, so I don't think I'll be sending that one along.
>
> How much debugging needs to be on to get a good understanding of why
> this radium process would stop passing traffic?
>
> Thanks,
>
> Jason
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090925/19748661/attachment.bin>
More information about the argus
mailing list