[ARGUS] ra stops unexpectedly
Carter Bullard
carter at qosient.com
Thu Sep 30 14:12:07 EDT 2004
Sure, but lets not not talk about it. I can see a command-line
switch or a .rarc directive to reconnect on failure, but with limitations,
say with only one remote source, reconnecting only once every 5 seconds
or so, that kind of thing, but I think its a lot more onerous than just
restart on failure.
Carter
> From: <slif at bellsouth.net>
> Date: Thu, 30 Sep 2004 13:59:32 -0400
> To: Carter Bullard <carter at qosient.com>, Peter Van Epp <vanepp at sfu.ca>, Argus
> <argus-info at lists.andrew.cmu.edu>
> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>
> Thank you for the explanation.
> I work better with illumination!
> -MIke
>
>>
>> From: Carter Bullard <carter at qosient.com>
>> Date: 2004/09/30 Thu PM 01:56:22 EDT
>> To: <slif at bellsouth.net>,
>> Peter Van Epp <vanepp at sfu.ca>,
>> Argus <argus-info at lists.andrew.cmu.edu>
>> Subject: Re: [ARGUS] ra stops unexpectedly
>>
>> Hey Mike,
>> Well, you are projecting your desire for a feature and building
>> a rather obtuse religious argument for its justification. TCP
>> tries hard because that is its design, a reliable transport
>> protocol. Why does UDP not try so hard? Well that's its design.
>>
>> If you want to understand why engineering reliability into
>> transports where its not needed is not necessarily a good thing,
>> look at the issues with using SCTP for non-reliable transport.
>> I think its unnecessary, expensive and sometimes unpredictable.
>>
>> But the reality is simple. If you want the clients to have a
>> persistent connection feature, then we should talk about it.
>>
>> There are three specific reasons why its not there now. The
>> first is that we want to have a simple, consistent failure model
>> for ra* clients. Once you advertise that you're "reliable", you
>> get into some complex code to actually provide the feature.
>>
>> Second, all ra() clients can connect to multiple sources
>> simultaneously, which makes a simple persistent connection feature
>> pretty complicated (if one fails, do you shutdown all of them and
>> start over?).
>>
>> The third is that its not clear that all clients should
>> persistently connect to a remote data source, so do we need
>> to put it into the general strategy?
>>
>> None of this means we can't provide a "reconnect on failure"
>> feature, but what are we going to specify when you're connected
>> to 3 remote data sources? How do we notify the specific client
>> that a source has been lost, or has not ever been connected?
>>
>> Carter
>>
>>
>>
>>
>>> From: <slif at bellsouth.net>
>>> Date: Thu, 30 Sep 2004 13:35:24 -0400
>>> To: Carter Bullard <carter at qosient.com>, Peter Van Epp <vanepp at sfu.ca>,
>>> Argus
>>> <argus-info at lists.andrew.cmu.edu>
>>> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>>>
>>>
>>>>
>>>> From: Carter Bullard <carter at qosient.com>
>>>> Date: 2004/09/30 Thu AM 11:38:09 EDT
>>>> To: <slif at bellsouth.net>,
>>>> Peter Van Epp <vanepp at sfu.ca>,
>>>> Argus <argus-info at lists.andrew.cmu.edu>
>>>> Subject: Re: [ARGUS] ra stops unexpectedly
>>>>
>>>> The problem is that if you aren't receiving MAR records,
>>>> then the for argus is probably dead, and you won't receive
>>>> anything ever again.
>>>
>>>
>>> Why does the TCP protocol try so hard ?
>>> In part because the authors realized there are so many
>>> ways to make re-synchronizing painful and problematic.
>>>
>>> Stopping when one "feels" a far end point is no longer connected
>>> just doesn't seem right. Sure I _can_ write yet another script
>>> to monitor this program. I would prefer to indicate more
>>> that "Hey, I had to restart that process". I would likely not
>>> know the reason for the process terminating. Without that
>>> information, I will have more difficulty trying to apply
>>> a remedy.
>>>
>>> The solution that is localized to the problem is the easiest
>>> to maintain.
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> So what's to keep the user from writing a script to respawn
>>>> ra(), if that's what the user wants it do? That's pretty easy
>>>> isn't it?
>>>>
>>>>
>>>>
>>>>
>>>>> From: <slif at bellsouth.net>
>>>>> Date: Wed, 29 Sep 2004 18:00:56 -0400
>>>>> To: Peter Van Epp <vanepp at sfu.ca>, <argus-info at lists.andrew.cmu.edu>
>>>>> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>>>>>
>>>>> I don't see the justification for stopping based on
>>>>> not seeing MAR records. If the connecction was not reset by peer,
>>>>> I would prefer the client do everything it possibly can
>>>>> to connect to its server.
>>>>>
>>>>> If the connection breaks, throw a log message and try again.
>>>>> If that fails, wait one minute.
>>>>> Repeat until an operator or user stops the client.
>>>>>
>>>>> Then again, I don't know whether the argus clients meet
>>>>> the expectations of other users.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> From: Peter Van Epp <vanepp at sfu.ca>
>>>>>> Date: 2004/09/29 Wed PM 05:28:06 EDT
>>>>>> To: argus-info at lists.andrew.cmu.edu
>>>>>> Subject: Re: [ARGUS] ra stops unexpectedly
>>>>>>
>>>>>> It looks like this shouldn't happen :-). Even on an idle link you
>>>>>> should be getting mar records every reporting interval and that (perhaps
>>>>>> anyway) should reset the counter I'd expect. As a quick workaround (until
>>>>>> Carter can suggest what may really be wrong :-)) try commenting out the
>>>>>> timeout
>>>>>> in argus_parse.c:
>>>>>>
>>>>>> at line 2737
>>>>>>
>>>>>> ArgusAdjustGlobalTime(&ArgusRealTime);
>>>>>>
>>>>>> /*
>>>>>> if (input->hostname && input->ArgusMarInterval) {
>>>>>> if (input->ArgusLastTime.tv_sec) {
>>>>>> if ((ArgusRealTime.tv_sec -
>>>>>> input->ArgusLastTime.tv_sec)
>>>>>>> (3 * input->ArgusMarInterval)) {
>>>>>> ArgusLog (LOG_WARNING, "ArgusReadStream %s:
>>>>>> idle
>>>>>> stre
>>>>>> am: closing", input->hostname);
>>>>>> ArgusCloseInput(input);
>>>>>> ArgusRemoteFDs[i] = NULL;
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> */
>>>>>>
>>>>>> That should stop the timeout, (it may also do something else
>>>>>> undesirable though :-)). The trick would be to see where (and by what)
>>>>>>
>>>>>> input->ArgusLastTime.tv_sec
>>>>>>
>>>>>> is being updated. I'd expect MAR records to do that and thus avoid this.
>>>>>> All
>>>>>> that said my link must not get busy, because it doesn't happen here (of
>>>>>> course
>>>>>> the link between the two is a 3 ft crossover cable too). Could you be
>>>>>> seeing
>>>>>> a link interruption between the sensor and the host that ra is running on
>>>>>> so
>>>>>> you really don't see any MAR records for an interval? That would be
>>>>>> another
>>>>>> possibility.
>>>>>>
>>>>>> Peter Van Epp / Operations and Technical Support
>>>>>> Simon Fraser University, Burnaby, B.C. Canada
>>>>>>
>>>>>> On Wed, Sep 29, 2004 at 04:53:02PM -0400, slif at bellsouth.net wrote:
>>>>>>> The remote argus is from argus-2.0.6.fixes.1
>>>>>>>
>>>>>>> Running "ra -w FILE -S IP" from argus-clients-2.0.6.fixes.1
>>>>>>>
>>>>>>> "ra" will return unexpectedly.
>>>>>>> This message is displayed :
>>>>>>>
>>>>>>> "ArgusWarning: ra[PID]: ArgusReadStream IP: idle stream: closing"
>>>>>>>
>>>>>>>
>>>>>>> What can be done so that "ra" will not stop when stream
>>>>>>> is apparently idle ?
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>
More information about the argus
mailing list