Radium dropping connections to argi

Carter Bullard carter at qosient.com
Fri Feb 4 16:10:55 EST 2011


Hey Phillip,
I did find a problem, and this patch should fix radium() apparently not attempting to reconnect
after a while.  I've got it in the distribution but give it a try on your machine to see if it doesn't
correct the problem.

Carter

==== //depot/argus/clients/common/argus_client.c#204 - /Users/carter/argus/clients/common/argus_client.c ====
2523a2524,2525
> 
>                      input->status &= ~ARGUS_CLOSED;


On Feb 4, 2011, at 4:01 PM, Carter Bullard wrote:

> Hey Phillip,
> radium() doesn't have a retry counter, it should keep trying every 5 seconds if threaded and every 1 second if
> non-thread, and it should try forever.  I've recreated a problem where radium(), after the far side has gone
> away a few times, it loses the connection, so I'm working this now.
> 
> Carter
> 
> 
> On Feb 4, 2011, at 12:11 PM, Phillip Deneault wrote:
> 
>> On 1/31/2011 1:59 PM, Phillip Deneault wrote:
>>> This might muddle the issue, but I'm having an odd issue with radium
>>> too.  The longer radium is running, the fewer and fewer records seem >
>> to be recorded.  It appears that the radium instance loses it
>>> connection to the argi one at a time and doesn't keep retrying and
>>> doesn't throw an error in the logs about any soft of failed
>>> connection.
>>> 
>>> There are quite a few nodes I'm connecting to (all on the local lan),
>>> and this was happening in 3.0.2 version of argus-clients as well as
>>> 3.0.3.21(which I am running now).  I'm running Centos 5.5.
>>> 
>>> I'm uping the debug level and trying to figure this out, but can
>>> anyone else confirm they see this problem?
>> 
>> So I'll assume no one else is seeing this problem.
>> 
>> Yesterday we had some network interruption and a number of the nodes
>> once again got disconnected from the radium instance.  It appears the
>> radium instance tried to reconnect 10 times, all with a 'no route to
>> host' before it appeared to stop retrying.
>> 
>> The number 10 sounds to me like a nice round number, is this a hardcoded
>> retry count in radium?
>> 
>> I might be getting ahead of myself but should I instead use the -p
>> option to kill radium if I drop a connection and use a process monitor
>> to restart it?
>> 
>> Thanks,
>> Phil
>> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20110204/525efe8a/attachment.bin>


More information about the argus mailing list