[ARGUS] ra stops unexpectedly

slif at bellsouth.net slif at bellsouth.net
Thu Sep 30 13:59:32 EDT 2004


Thank you for the explanation.
I work better with illumination!
-MIke

> 
> From: Carter Bullard <carter at qosient.com>
> Date: 2004/09/30 Thu PM 01:56:22 EDT
> To: <slif at bellsouth.net>, 
> 	Peter Van Epp <vanepp at sfu.ca>, 
> 	Argus <argus-info at lists.andrew.cmu.edu>
> Subject: Re: [ARGUS] ra stops unexpectedly
> 
> Hey Mike,
> Well, you are projecting your desire for a feature and building
> a rather obtuse religious argument for its justification.  TCP
> tries hard because that is its design, a reliable transport
> protocol.  Why does UDP not try so hard?  Well that's its design.
> 
> If you want to understand why engineering reliability into
> transports where its not needed is not necessarily a good thing,
> look at the issues with using SCTP for non-reliable transport.
> I think its unnecessary, expensive and sometimes unpredictable.
> 
> But the reality is simple.  If you want the clients to have a
> persistent connection feature, then we should talk about it.
> 
> There are three specific reasons why its not there now.  The
> first is that we want to have a simple, consistent failure model
> for ra* clients.  Once you advertise that you're "reliable", you
> get into some complex code to actually provide the feature.
> 
> Second,  all ra() clients can connect to multiple sources
> simultaneously, which makes a simple persistent connection feature
> pretty complicated (if one fails, do you shutdown all of them and
> start over?).  
> 
> The third is that its not clear that all clients should
> persistently connect to a remote data source, so do we need
> to put it into the general strategy?
> 
> None of this means we can't provide a "reconnect on failure"
> feature, but what are we going to specify when you're connected
> to 3 remote data sources?  How do we notify the specific client
> that a source has been lost, or has not ever been connected?
> 
> Carter
> 
> 
> 
> 
> > From: <slif at bellsouth.net>
> > Date: Thu, 30 Sep 2004 13:35:24 -0400
> > To: Carter Bullard <carter at qosient.com>, Peter Van Epp <vanepp at sfu.ca>, Argus
> > <argus-info at lists.andrew.cmu.edu>
> > Subject: Re: Re: [ARGUS] ra stops unexpectedly
> > 
> > 
> >> 
> >> From: Carter Bullard <carter at qosient.com>
> >> Date: 2004/09/30 Thu AM 11:38:09 EDT
> >> To: <slif at bellsouth.net>,
> >> Peter Van Epp <vanepp at sfu.ca>,
> >> Argus <argus-info at lists.andrew.cmu.edu>
> >> Subject: Re: [ARGUS] ra stops unexpectedly
> >> 
> >> The problem is that if you aren't receiving MAR records,
> >> then the for argus is probably dead, and you won't receive
> >> anything ever again.
> > 
> > 
> > Why does the TCP protocol try so hard ?
> > In part because the authors realized there are so many
> > ways to make re-synchronizing painful and problematic.
> > 
> > Stopping when one "feels" a far end point is no longer connected
> > just doesn't seem right.  Sure I _can_ write yet another script
> > to monitor this program.  I would prefer to indicate more
> > that "Hey, I had to restart that process".  I would likely not
> > know the reason for the process terminating.  Without that
> > information, I will have more difficulty trying to apply
> > a remedy.
> > 
> > The solution that is localized to the problem is the easiest
> > to maintain.
> > 
> > 
> > 
> > 
> > 
> > 
> >> 
> >> So what's to keep the user from writing a script to respawn
> >> ra(), if that's what the user wants it do?  That's pretty easy
> >> isn't it?
> >> 
> >> 
> >> 
> >> 
> >>> From: <slif at bellsouth.net>
> >>> Date: Wed, 29 Sep 2004 18:00:56 -0400
> >>> To: Peter Van Epp <vanepp at sfu.ca>, <argus-info at lists.andrew.cmu.edu>
> >>> Subject: Re: Re: [ARGUS] ra stops unexpectedly
> >>> 
> >>> I don't see the justification for stopping based on
> >>> not seeing MAR records.  If the connecction was not reset by peer,
> >>> I would prefer the client do everything it possibly can
> >>> to connect to its server.
> >>> 
> >>> If the connection breaks, throw a log message and try again.
> >>> If that fails, wait one minute.
> >>> Repeat until an operator or user stops the client.
> >>> 
> >>> Then again, I don't know whether the argus clients meet
> >>> the expectations of other users.
> >>> 
> >>> 
> >>> 
> >>>> 
> >>>> From: Peter Van Epp <vanepp at sfu.ca>
> >>>> Date: 2004/09/29 Wed PM 05:28:06 EDT
> >>>> To: argus-info at lists.andrew.cmu.edu
> >>>> Subject: Re: [ARGUS] ra stops unexpectedly
> >>>> 
> >>>> It looks like this shouldn't happen :-). Even on an idle link you
> >>>> should be getting mar records every reporting interval and that (perhaps
> >>>> anyway) should reset the counter I'd expect. As a quick workaround (until
> >>>> Carter can suggest what may really be wrong :-)) try commenting out the
> >>>> timeout
> >>>> in argus_parse.c:
> >>>> 
> >>>> at line 2737
> >>>> 
> >>>>                   ArgusAdjustGlobalTime(&ArgusRealTime);
> >>>> 
> >>>> /*            
> >>>>                   if (input->hostname && input->ArgusMarInterval) {
> >>>>                      if (input->ArgusLastTime.tv_sec) {
> >>>>                         if ((ArgusRealTime.tv_sec -
> >>>> input->ArgusLastTime.tv_sec)
> >>>>> (3 * input->ArgusMarInterval)) {
> >>>>                            ArgusLog (LOG_WARNING, "ArgusReadStream %s: idle
> >>>> stre
> >>>> am: closing", input->hostname);
> >>>>                            ArgusCloseInput(input);
> >>>>                            ArgusRemoteFDs[i] = NULL;
> >>>>                         }
> >>>>                      }
> >>>>                   }
> >>>> */
> >>>> 
> >>>> That should stop the timeout, (it may also do something else
> >>>> undesirable though :-)). The trick would be to see where (and by what)
> >>>> 
> >>>> input->ArgusLastTime.tv_sec
> >>>> 
> >>>> is being updated. I'd expect MAR records to do that and thus avoid this.
> >>>> All
> >>>> that said my link must not get busy, because it doesn't happen here (of
> >>>> course 
> >>>> the link between the two is a 3 ft crossover cable too).  Could you be
> >>>> seeing
> >>>> a link interruption between the sensor and the host that ra is running on
> >>>> so
> >>>> you really don't see any MAR records for an interval? That would be another
> >>>> possibility.
> >>>> 
> >>>> Peter Van Epp / Operations and Technical Support
> >>>> Simon Fraser University, Burnaby, B.C. Canada
> >>>> 
> >>>> On Wed, Sep 29, 2004 at 04:53:02PM -0400, slif at bellsouth.net wrote:
> >>>>>        The remote argus is from argus-2.0.6.fixes.1
> >>>>> 
> >>>>>   Running "ra -w FILE -S IP" from argus-clients-2.0.6.fixes.1
> >>>>> 
> >>>>>  "ra" will return unexpectedly.
> >>>>>  This message is displayed :
> >>>>> 
> >>>>>     "ArgusWarning: ra[PID]: ArgusReadStream IP: idle stream: closing"
> >>>>> 
> >>>>> 
> >>>>>  What can be done so that "ra" will not stop when stream
> >>>>>    is apparently idle ?
> >>>>> 
> >>>> 
> >>> 
> >>> 
> >> 
> >> 
> >> 
> > 
> > 
> 
> 
> 




More information about the argus mailing list