[ARGUS] ra stops unexpectedly

Carter Bullard carter at qosient.com
Thu Sep 30 13:56:22 EDT 2004

Hey Mike,
Well, you are projecting your desire for a feature and building
a rather obtuse religious argument for its justification.  TCP
tries hard because that is its design, a reliable transport
protocol.  Why does UDP not try so hard?  Well that's its design.

If you want to understand why engineering reliability into
transports where its not needed is not necessarily a good thing,
look at the issues with using SCTP for non-reliable transport.
I think its unnecessary, expensive and sometimes unpredictable.

But the reality is simple.  If you want the clients to have a
persistent connection feature, then we should talk about it.

There are three specific reasons why its not there now.  The
first is that we want to have a simple, consistent failure model
for ra* clients.  Once you advertise that you're "reliable", you
get into some complex code to actually provide the feature.

Second,  all ra() clients can connect to multiple sources
simultaneously, which makes a simple persistent connection feature
pretty complicated (if one fails, do you shutdown all of them and
start over?).  

The third is that its not clear that all clients should
persistently connect to a remote data source, so do we need
to put it into the general strategy?

None of this means we can't provide a "reconnect on failure"
feature, but what are we going to specify when you're connected
to 3 remote data sources?  How do we notify the specific client
that a source has been lost, or has not ever been connected?


> From: <slif at bellsouth.net>
> Date: Thu, 30 Sep 2004 13:35:24 -0400
> To: Carter Bullard <carter at qosient.com>, Peter Van Epp <vanepp at sfu.ca>, Argus
> <argus-info at lists.andrew.cmu.edu>
> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>> From: Carter Bullard <carter at qosient.com>
>> Date: 2004/09/30 Thu AM 11:38:09 EDT
>> To: <slif at bellsouth.net>,
>> Peter Van Epp <vanepp at sfu.ca>,
>> Argus <argus-info at lists.andrew.cmu.edu>
>> Subject: Re: [ARGUS] ra stops unexpectedly
>> The problem is that if you aren't receiving MAR records,
>> then the for argus is probably dead, and you won't receive
>> anything ever again.
> Why does the TCP protocol try so hard ?
> In part because the authors realized there are so many
> ways to make re-synchronizing painful and problematic.
> Stopping when one "feels" a far end point is no longer connected
> just doesn't seem right.  Sure I _can_ write yet another script
> to monitor this program.  I would prefer to indicate more
> that "Hey, I had to restart that process".  I would likely not
> know the reason for the process terminating.  Without that
> information, I will have more difficulty trying to apply
> a remedy.
> The solution that is localized to the problem is the easiest
> to maintain.
>> So what's to keep the user from writing a script to respawn
>> ra(), if that's what the user wants it do?  That's pretty easy
>> isn't it?
>>> From: <slif at bellsouth.net>
>>> Date: Wed, 29 Sep 2004 18:00:56 -0400
>>> To: Peter Van Epp <vanepp at sfu.ca>, <argus-info at lists.andrew.cmu.edu>
>>> Subject: Re: Re: [ARGUS] ra stops unexpectedly
>>> I don't see the justification for stopping based on
>>> not seeing MAR records.  If the connecction was not reset by peer,
>>> I would prefer the client do everything it possibly can
>>> to connect to its server.
>>> If the connection breaks, throw a log message and try again.
>>> If that fails, wait one minute.
>>> Repeat until an operator or user stops the client.
>>> Then again, I don't know whether the argus clients meet
>>> the expectations of other users.
>>>> From: Peter Van Epp <vanepp at sfu.ca>
>>>> Date: 2004/09/29 Wed PM 05:28:06 EDT
>>>> To: argus-info at lists.andrew.cmu.edu
>>>> Subject: Re: [ARGUS] ra stops unexpectedly
>>>> It looks like this shouldn't happen :-). Even on an idle link you
>>>> should be getting mar records every reporting interval and that (perhaps
>>>> anyway) should reset the counter I'd expect. As a quick workaround (until
>>>> Carter can suggest what may really be wrong :-)) try commenting out the
>>>> timeout
>>>> in argus_parse.c:
>>>> at line 2737
>>>>                   ArgusAdjustGlobalTime(&ArgusRealTime);
>>>> /*            
>>>>                   if (input->hostname && input->ArgusMarInterval) {
>>>>                      if (input->ArgusLastTime.tv_sec) {
>>>>                         if ((ArgusRealTime.tv_sec -
>>>> input->ArgusLastTime.tv_sec)
>>>>> (3 * input->ArgusMarInterval)) {
>>>>                            ArgusLog (LOG_WARNING, "ArgusReadStream %s: idle
>>>> stre
>>>> am: closing", input->hostname);
>>>>                            ArgusCloseInput(input);
>>>>                            ArgusRemoteFDs[i] = NULL;
>>>>                         }
>>>>                      }
>>>>                   }
>>>> */
>>>> That should stop the timeout, (it may also do something else
>>>> undesirable though :-)). The trick would be to see where (and by what)
>>>> input->ArgusLastTime.tv_sec
>>>> is being updated. I'd expect MAR records to do that and thus avoid this.
>>>> All
>>>> that said my link must not get busy, because it doesn't happen here (of
>>>> course 
>>>> the link between the two is a 3 ft crossover cable too).  Could you be
>>>> seeing
>>>> a link interruption between the sensor and the host that ra is running on
>>>> so
>>>> you really don't see any MAR records for an interval? That would be another
>>>> possibility.
>>>> Peter Van Epp / Operations and Technical Support
>>>> Simon Fraser University, Burnaby, B.C. Canada
>>>> On Wed, Sep 29, 2004 at 04:53:02PM -0400, slif at bellsouth.net wrote:
>>>>>        The remote argus is from argus-2.0.6.fixes.1
>>>>>   Running "ra -w FILE -S IP" from argus-clients-2.0.6.fixes.1
>>>>>  "ra" will return unexpectedly.
>>>>>  This message is displayed :
>>>>>     "ArgusWarning: ra[PID]: ArgusReadStream IP: idle stream: closing"
>>>>>  What can be done so that "ra" will not stop when stream
>>>>>    is apparently idle ?

More information about the argus mailing list