[ARGUS] ra stops unexpectedly

Thu Sep 30 14:12:07 EDT 2004

Sure, but lets not not talk about it.  I can see a command-line
switch or a .rarc directive to reconnect on failure, but with limitations,
say with only one remote source, reconnecting only once every 5 seconds
or so, that kind of thing, but I think its a lot more onerous than just
restart on failure.

Carter

> From: <slif at bellsouth.net>
> Date: Thu, 30 Sep 2004 13:59:32 -0400
> To: Carter Bullard <carter at qosient.com>, Peter Van Epp <vanepp at sfu.ca>, Argus
> <argus-info at lists.andrew.cmu.edu>
> Subject: Re: Re: [ARGUS] ra stops unexpectedly
> 
> Thank you for the explanation.
> I work better with illumination!
> -MIke
> 
>> 
>> From: Carter Bullard <carter at qosient.com>
>> Date: 2004/09/30 Thu PM 01:56:22 EDT
>> To: <slif at bellsouth.net>,
>> Peter Van Epp <vanepp at sfu.ca>,
>> Argus <argus-info at lists.andrew.cmu.edu>
>> Subject: Re: [ARGUS] ra stops unexpectedly
>> 
>> Hey Mike,
>> Well, you are projecting your desire for a feature and building
>> a rather obtuse religious argument for its justification.  TCP
>> tries hard because that is its design, a reliable transport
>> protocol.  Why does UDP not try so hard?  Well that's its design.
>> 
>> If you want to understand why engineering reliability into
>> transports where its not needed is not necessarily a good thing,
>> look at the issues with using SCTP for non-reliable transport.
>> I think its unnecessary, expensive and sometimes unpredictable.
>> 
>> But the reality is simple.  If you want the clients to have a
>> persistent connection feature, then we should talk about it.
>> 
>> There are three specific reasons why its not there now.  The
>> first is that we want to have a simple, consistent failure model
>> for ra* clients.  Once you advertise that you're "reliable", you
>> get into some complex code to actually provide the feature.
>> 
>> Second,  all ra() clients can connect to multiple sources
>> simultaneously, which makes a simple persistent connection feature
>> pretty complicated (if one fails, do you shutdown all of them and
>> start over?).  
>> 
>> The third is that its not clear that all clients should
>> persistently connect to a remote data source, so do we need
>> to put it into the general strategy?
>> 
>> None of this means we can't provide a "reconnect on failure"
>> feature, but what are we going to specify when you're connected
>> to 3 remote data sources?  How do we notify the specific client
>> that a source has been lost, or has not ever been connected?
>> 
>> Carter
>> 
>> 
>> 
>> 
>>> From: <slif at bellsouth.net>
>>> Date: Thu, 30 Sep 2004 13:35:24 -0400
>>> To: Carter Bullard <carter at qosient.com>, Peter Van Epp <vanepp at sfu.ca>,
>>> Argus
>>> <argus-info at lists.andrew.cmu.edu>
>>> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>>> 
>>> 
>>>> 
>>>> From: Carter Bullard <carter at qosient.com>
>>>> Date: 2004/09/30 Thu AM 11:38:09 EDT
>>>> To: <slif at bellsouth.net>,
>>>> Peter Van Epp <vanepp at sfu.ca>,
>>>> Argus <argus-info at lists.andrew.cmu.edu>
>>>> Subject: Re: [ARGUS] ra stops unexpectedly
>>>> 
>>>> The problem is that if you aren't receiving MAR records,
>>>> then the for argus is probably dead, and you won't receive
>>>> anything ever again.
>>> 
>>> 
>>> Why does the TCP protocol try so hard ?
>>> In part because the authors realized there are so many
>>> ways to make re-synchronizing painful and problematic.
>>> 
>>> Stopping when one "feels" a far end point is no longer connected
>>> just doesn't seem right.  Sure I _can_ write yet another script
>>> to monitor this program.  I would prefer to indicate more
>>> that "Hey, I had to restart that process".  I would likely not
>>> know the reason for the process terminating.  Without that
>>> information, I will have more difficulty trying to apply
>>> a remedy.
>>> 
>>> The solution that is localized to the problem is the easiest
>>> to maintain.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> So what's to keep the user from writing a script to respawn
>>>> ra(), if that's what the user wants it do?  That's pretty easy
>>>> isn't it?
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> From: <slif at bellsouth.net>
>>>>> Date: Wed, 29 Sep 2004 18:00:56 -0400
>>>>> To: Peter Van Epp <vanepp at sfu.ca>, <argus-info at lists.andrew.cmu.edu>
>>>>> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>>>>> 
>>>>> I don't see the justification for stopping based on
>>>>> not seeing MAR records.  If the connecction was not reset by peer,
>>>>> I would prefer the client do everything it possibly can
>>>>> to connect to its server.
>>>>> 
>>>>> If the connection breaks, throw a log message and try again.
>>>>> If that fails, wait one minute.
>>>>> Repeat until an operator or user stops the client.
>>>>> 
>>>>> Then again, I don't know whether the argus clients meet
>>>>> the expectations of other users.
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> From: Peter Van Epp <vanepp at sfu.ca>
>>>>>> Date: 2004/09/29 Wed PM 05:28:06 EDT
>>>>>> To: argus-info at lists.andrew.cmu.edu
>>>>>> Subject: Re: [ARGUS] ra stops unexpectedly
>>>>>> 
>>>>>> It looks like this shouldn't happen :-). Even on an idle link you
>>>>>> should be getting mar records every reporting interval and that (perhaps
>>>>>> anyway) should reset the counter I'd expect. As a quick workaround (until
>>>>>> Carter can suggest what may really be wrong :-)) try commenting out the
>>>>>> timeout
>>>>>> in argus_parse.c:
>>>>>> 
>>>>>> at line 2737
>>>>>> 
>>>>>>                   ArgusAdjustGlobalTime(&ArgusRealTime);
>>>>>> 
>>>>>> /*          
>>>>>>                   if (input->hostname && input->ArgusMarInterval) {
>>>>>>                      if (input->ArgusLastTime.tv_sec) {
>>>>>>                         if ((ArgusRealTime.tv_sec -
>>>>>> input->ArgusLastTime.tv_sec)
>>>>>>> (3 * input->ArgusMarInterval)) {
>>>>>>                            ArgusLog (LOG_WARNING, "ArgusReadStream %s:
>>>>>> idle
>>>>>> stre
>>>>>> am: closing", input->hostname);
>>>>>>                            ArgusCloseInput(input);
>>>>>>                            ArgusRemoteFDs[i] = NULL;
>>>>>>                         }
>>>>>>                      }
>>>>>>                   }
>>>>>> */
>>>>>> 
>>>>>> That should stop the timeout, (it may also do something else
>>>>>> undesirable though :-)). The trick would be to see where (and by what)
>>>>>> 
>>>>>> input->ArgusLastTime.tv_sec
>>>>>> 
>>>>>> is being updated. I'd expect MAR records to do that and thus avoid this.
>>>>>> All
>>>>>> that said my link must not get busy, because it doesn't happen here (of
>>>>>> course 
>>>>>> the link between the two is a 3 ft crossover cable too).  Could you be
>>>>>> seeing
>>>>>> a link interruption between the sensor and the host that ra is running on
>>>>>> so
>>>>>> you really don't see any MAR records for an interval? That would be
>>>>>> another
>>>>>> possibility.
>>>>>> 
>>>>>> Peter Van Epp / Operations and Technical Support
>>>>>> Simon Fraser University, Burnaby, B.C. Canada
>>>>>> 
>>>>>> On Wed, Sep 29, 2004 at 04:53:02PM -0400, slif at bellsouth.net wrote:
>>>>>>>        The remote argus is from argus-2.0.6.fixes.1
>>>>>>> 
>>>>>>>   Running "ra -w FILE -S IP" from argus-clients-2.0.6.fixes.1
>>>>>>> 
>>>>>>>  "ra" will return unexpectedly.
>>>>>>>  This message is displayed :
>>>>>>> 
>>>>>>>     "ArgusWarning: ra[PID]: ArgusReadStream IP: idle stream: closing"
>>>>>>> 
>>>>>>> 
>>>>>>>  What can be done so that "ra" will not stop when stream
>>>>>>>    is apparently idle ?
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
>