Radium dropping connections to argi

Phillip G Deneault deneault at WPI.EDU
Sat Mar 5 18:08:36 EST 2011


Actually, I spoke to soon.  It happened again last night after not 
happening for weeks.  I'm going to see if I can simulate this behavior 
tomorrow or Monday and try to get a packet capture of the behavior as it 
occurs.

Thanks,
Phil

On Sat, 5 Mar 2011, Carter Bullard wrote:

> Excerrent !!!!   That is great news !!!!
> Carter
>
> On Mar 4, 2011, at 1:29 PM, Phillip Deneault <deneault at WPI.EDU> wrote:
>
>> Carter,
>>
>> I didn't forget about you.  I've been letting this run for a while then
>> ran it for a little more when you released .23.  It seems to have fixed
>> the bug as I still have not had any problems.
>>
>> Thanks,
>> Phil
>>
>> On 2/4/2011 4:10 PM, Carter Bullard wrote:
>>> Hey Phillip,
>>> I did find a problem, and this patch should fix radium() apparently not attempting to reconnect
>>> after a while.  I've got it in the distribution but give it a try on your machine to see if it doesn't
>>> correct the problem.
>>>
>>> Carter
>>>
>>> ==== //depot/argus/clients/common/argus_client.c#204 - /Users/carter/argus/clients/common/argus_client.c ====
>>> 2523a2524,2525
>>>>
>>>>                     input->status &= ~ARGUS_CLOSED;
>>>
>>>
>>> On Feb 4, 2011, at 4:01 PM, Carter Bullard wrote:
>>>
>>>> Hey Phillip,
>>>> radium() doesn't have a retry counter, it should keep trying every 5 seconds if threaded and every 1 second if
>>>> non-thread, and it should try forever.  I've recreated a problem where radium(), after the far side has gone
>>>> away a few times, it loses the connection, so I'm working this now.
>>>>
>>>> Carter
>>>>
>>>>
>>>> On Feb 4, 2011, at 12:11 PM, Phillip Deneault wrote:
>>>>
>>>>> On 1/31/2011 1:59 PM, Phillip Deneault wrote:
>>>>>> This might muddle the issue, but I'm having an odd issue with radium
>>>>>> too.  The longer radium is running, the fewer and fewer records seem >
>>>>> to be recorded.  It appears that the radium instance loses it
>>>>>> connection to the argi one at a time and doesn't keep retrying and
>>>>>> doesn't throw an error in the logs about any soft of failed
>>>>>> connection.
>>>>>>
>>>>>> There are quite a few nodes I'm connecting to (all on the local lan),
>>>>>> and this was happening in 3.0.2 version of argus-clients as well as
>>>>>> 3.0.3.21(which I am running now).  I'm running Centos 5.5.
>>>>>>
>>>>>> I'm uping the debug level and trying to figure this out, but can
>>>>>> anyone else confirm they see this problem?
>>>>>
>>>>> So I'll assume no one else is seeing this problem.
>>>>>
>>>>> Yesterday we had some network interruption and a number of the nodes
>>>>> once again got disconnected from the radium instance.  It appears the
>>>>> radium instance tried to reconnect 10 times, all with a 'no route to
>>>>> host' before it appeared to stop retrying.
>>>>>
>>>>> The number 10 sounds to me like a nice round number, is this a hardcoded
>>>>> retry count in radium?
>>>>>
>>>>> I might be getting ahead of myself but should I instead use the -p
>>>>> option to kill radium if I drop a connection and use a process monitor
>>>>> to restart it?
>>>>>
>>>>> Thanks,
>>>>> Phil
>>>>>
>>>>
>>>
>>
>>
>




More information about the argus mailing list