radium stops passing traffic

Carter Bullard carter at qosient.com
Wed Oct 7 09:40:19 EDT 2009


Hey Jason,
Did you check any syslog messages?  That is usually the giveaway.
If not, the best we can do is attach to radium() using gdb() to see what
it thinks is going on.

Carter

On Sep 28, 2009, at 2:48 PM, Jason Carr wrote:

> Checking on the process, it is not using 100% when it gets to the  
> point of no longer working.
>
> What other information might you need?
>
> Thanks,
>
> Jason
>
> On Sep 25, 2009, at 3:31 PM, Jason Carr wrote:
>
>> Once radium is in this state, ra -S localhost:561 accept  
>> connections but no longer produces any output.  Connecting to the  
>> arguses works fine via ra -S 10.10.10.100:561 and does produce  
>> output.
>>
>> I'll check to see if the CPU load is high at the time, but I do not  
>> believe that it is.
>>
>> - Jason
>>
>> On Sep 25, 2009, at 1:33 PM, Carter Bullard wrote:
>>
>>> Hey Jason,
>>> There are a few reasons why radium could stop, most should leave a  
>>> trail in
>>> a syslog file somewhere.  The most likely is that radium() has  
>>> terminated the
>>> connection to a client that isn't reading fast enough.  Radium  
>>> figures this out
>>> because its output queue gets too big, assumes the remote has gone  
>>> or is too
>>> slow, and gives up.   Generally, the client (which can be  
>>> configured to retry
>>> the connection) gets a shutdown message, and then reattaches and  
>>> all is
>>> goodness again.
>>>
>>> Of course there could be bugs anywhere in this logic.  So when  
>>> this happens
>>> there are a few quick questions.
>>>
>>> Does radium have a connection to the client, but no data is being  
>>> passed?  If this
>>> is the case, we definitely have a bug and the best strategy is to  
>>> attach to the running
>>> radium() with gdb() and  step through to see what the problem is.
>>>
>>> Is radium still reading records from the remote argi?  If not  
>>> radium maybe fine, but
>>> there isn't any data to transmit, or radium has lost its  
>>> connections from the argi, as
>>> it isn't processing fast enough itself to keep up with the record  
>>> load.
>>>
>>> When radium isn't passing records, is it responding to additional  
>>> connection
>>> requests?  This would test if the radium output thread is  
>>> completely dead,
>>> or spinning in a loop.
>>>
>>> So check if radium() is living.  use netstat -na to see if the  
>>> remote(s) still have
>>> active connections.  checkout the load to see if radium() is  
>>> chewing up 100% of
>>> one of the processors (infinite loop), and of course, check to see  
>>> if there are any
>>> syslog messages from radium() indicating if the queue limit is  
>>> reached or if
>>> its disconnecting or whatever.
>>>
>>> If this gets to be too much, lets see if I can logon and make some  
>>> sense of it?
>>>
>>>
>>> Carter
>>>
>>> On Sep 25, 2009, at 1:12 PM, Jason Carr wrote:
>>>
>>>> Hi guys,
>>>>
>>>> Here's what's going on right now.  We've got two argus processes  
>>>> running on our Bivio unit plus two radium processes.  One radium  
>>>> process runs on the Bivio to essentially multiplex and forward to  
>>>> the external radium process that runs on our long term storage  
>>>> machine.
>>>>
>>>> What happens currently is that after a dynamic amount of time the  
>>>> radium running on Bivio stops passing the traffic to the external  
>>>> radium process.  Killing radium and restarting it fixes the  
>>>> problem immediately.  Running radium in debug mode (-D 999)  
>>>> yields a 6.5G output file, so I don't think I'll be sending that  
>>>> one along.
>>>>
>>>> How much debugging needs to be on to get a good understanding of  
>>>> why this radium process would stop passing traffic?
>>>>
>>>> Thanks,
>>>>
>>>> Jason
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20091007/babe487e/attachment.bin>


More information about the argus mailing list