Radium dropping connections to argi
Carter Bullard
carter at qosient.com
Sat Mar 5 11:13:44 EST 2011
Excerrent !!!! That is great news !!!!
Carter
On Mar 4, 2011, at 1:29 PM, Phillip Deneault <deneault at WPI.EDU> wrote:
> Carter,
>
> I didn't forget about you. I've been letting this run for a while then
> ran it for a little more when you released .23. It seems to have fixed
> the bug as I still have not had any problems.
>
> Thanks,
> Phil
>
> On 2/4/2011 4:10 PM, Carter Bullard wrote:
>> Hey Phillip,
>> I did find a problem, and this patch should fix radium() apparently not attempting to reconnect
>> after a while. I've got it in the distribution but give it a try on your machine to see if it doesn't
>> correct the problem.
>>
>> Carter
>>
>> ==== //depot/argus/clients/common/argus_client.c#204 - /Users/carter/argus/clients/common/argus_client.c ====
>> 2523a2524,2525
>>>
>>> input->status &= ~ARGUS_CLOSED;
>>
>>
>> On Feb 4, 2011, at 4:01 PM, Carter Bullard wrote:
>>
>>> Hey Phillip,
>>> radium() doesn't have a retry counter, it should keep trying every 5 seconds if threaded and every 1 second if
>>> non-thread, and it should try forever. I've recreated a problem where radium(), after the far side has gone
>>> away a few times, it loses the connection, so I'm working this now.
>>>
>>> Carter
>>>
>>>
>>> On Feb 4, 2011, at 12:11 PM, Phillip Deneault wrote:
>>>
>>>> On 1/31/2011 1:59 PM, Phillip Deneault wrote:
>>>>> This might muddle the issue, but I'm having an odd issue with radium
>>>>> too. The longer radium is running, the fewer and fewer records seem >
>>>> to be recorded. It appears that the radium instance loses it
>>>>> connection to the argi one at a time and doesn't keep retrying and
>>>>> doesn't throw an error in the logs about any soft of failed
>>>>> connection.
>>>>>
>>>>> There are quite a few nodes I'm connecting to (all on the local lan),
>>>>> and this was happening in 3.0.2 version of argus-clients as well as
>>>>> 3.0.3.21(which I am running now). I'm running Centos 5.5.
>>>>>
>>>>> I'm uping the debug level and trying to figure this out, but can
>>>>> anyone else confirm they see this problem?
>>>>
>>>> So I'll assume no one else is seeing this problem.
>>>>
>>>> Yesterday we had some network interruption and a number of the nodes
>>>> once again got disconnected from the radium instance. It appears the
>>>> radium instance tried to reconnect 10 times, all with a 'no route to
>>>> host' before it appeared to stop retrying.
>>>>
>>>> The number 10 sounds to me like a nice round number, is this a hardcoded
>>>> retry count in radium?
>>>>
>>>> I might be getting ahead of myself but should I instead use the -p
>>>> option to kill radium if I drop a connection and use a process monitor
>>>> to restart it?
>>>>
>>>> Thanks,
>>>> Phil
>>>>
>>>
>>
>
>
More information about the argus
mailing list