Radium dropping connections to argi

Phillip Deneault deneault at WPI.EDU
Thu Mar 10 14:38:15 EST 2011


I managed to repeat this problem with a sniffer running.  It didn't turn
up as much useful information as I would have liked.

For my test, I set up all of my sensors to restart argus once a day at
the same time via a init.d stop/start and set my tcpdump filter to look
like this:
port 561 and tcp[tcpflags] & (tcp-syn|tcp-fin|tcp-rst) != 0

I can see two sets of shutdowns, one the 9th for which all my sensors
came back, and one on the 10th when only 17 came back.  All the Argus
daemons did restart and come online, but basically for some hosts,
radium never even attempted to restart the connection.

tcpdump available upon request.

Thanks,
Phil


On 3/5/2011 6:08 PM, Phillip G Deneault wrote:
> Actually, I spoke to soon.  It happened again last night after not
> happening for weeks.  I'm going to see if I can simulate this behavior
> tomorrow or Monday and try to get a packet capture of the behavior as it
> occurs.
> 
> Thanks,
> Phil
> 
> On Sat, 5 Mar 2011, Carter Bullard wrote:
> 
>> Excerrent !!!!   That is great news !!!!
>> Carter
>>
>> On Mar 4, 2011, at 1:29 PM, Phillip Deneault <deneault at WPI.EDU> wrote:
>>
>>> Carter,
>>>
>>> I didn't forget about you.  I've been letting this run for a while then
>>> ran it for a little more when you released .23.  It seems to have fixed
>>> the bug as I still have not had any problems.
>>>
>>> Thanks,
>>> Phil
>>>
>>> On 2/4/2011 4:10 PM, Carter Bullard wrote:
>>>> Hey Phillip,
>>>> I did find a problem, and this patch should fix radium() apparently
>>>> not attempting to reconnect
>>>> after a while.  I've got it in the distribution but give it a try on
>>>> your machine to see if it doesn't
>>>> correct the problem.
>>>>
>>>> Carter
>>>>
>>>> ==== //depot/argus/clients/common/argus_client.c#204 -
>>>> /Users/carter/argus/clients/common/argus_client.c ====
>>>> 2523a2524,2525
>>>>>
>>>>>                     input->status &= ~ARGUS_CLOSED;
>>>>
>>>>
>>>> On Feb 4, 2011, at 4:01 PM, Carter Bullard wrote:
>>>>
>>>>> Hey Phillip,
>>>>> radium() doesn't have a retry counter, it should keep trying every
>>>>> 5 seconds if threaded and every 1 second if
>>>>> non-thread, and it should try forever.  I've recreated a problem
>>>>> where radium(), after the far side has gone
>>>>> away a few times, it loses the connection, so I'm working this now.
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Feb 4, 2011, at 12:11 PM, Phillip Deneault wrote:
>>>>>
>>>>>> On 1/31/2011 1:59 PM, Phillip Deneault wrote:
>>>>>>> This might muddle the issue, but I'm having an odd issue with radium
>>>>>>> too.  The longer radium is running, the fewer and fewer records
>>>>>>> seem >
>>>>>> to be recorded.  It appears that the radium instance loses it
>>>>>>> connection to the argi one at a time and doesn't keep retrying and
>>>>>>> doesn't throw an error in the logs about any soft of failed
>>>>>>> connection.
>>>>>>>
>>>>>>> There are quite a few nodes I'm connecting to (all on the local
>>>>>>> lan),
>>>>>>> and this was happening in 3.0.2 version of argus-clients as well as
>>>>>>> 3.0.3.21(which I am running now).  I'm running Centos 5.5.
>>>>>>>
>>>>>>> I'm uping the debug level and trying to figure this out, but can
>>>>>>> anyone else confirm they see this problem?
>>>>>>
>>>>>> So I'll assume no one else is seeing this problem.
>>>>>>
>>>>>> Yesterday we had some network interruption and a number of the nodes
>>>>>> once again got disconnected from the radium instance.  It appears the
>>>>>> radium instance tried to reconnect 10 times, all with a 'no route to
>>>>>> host' before it appeared to stop retrying.
>>>>>>
>>>>>> The number 10 sounds to me like a nice round number, is this a
>>>>>> hardcoded
>>>>>> retry count in radium?
>>>>>>
>>>>>> I might be getting ahead of myself but should I instead use the -p
>>>>>> option to kill radium if I drop a connection and use a process
>>>>>> monitor
>>>>>> to restart it?
>>>>>>
>>>>>> Thanks,
>>>>>> Phil
>>>>>>
>>>>>
>>>>
>>>
>>>
>>




More information about the argus mailing list