Another vote for packet drop detection

Sun Jan 29 16:55:09 EST 2012

On Sun, Jan 29, 2012 at 08:56:09AM -0500, Carter Bullard wrote:
> OK, I think that we are already doing all that is needed in argus to report on
> suspicion of infrastructure loss.  I have written about this for many years, and
> I'm glad that we're talking about this topic now.  We have done so much work
> to make sure that we're pretty accurate in reporting loss, and many have put
> Argus through the ringer on this.  (if you think you see a bug in simply reporting
> loss, please send to the list a packet trace that demonstrates that, and I'll fix
> it).  If Argus is doing a good job estimating loss, then we're talking about only
> adding another metric to differentiate unobserved traffic vs loss, based on
> your criteria.
> 

	Correct, but in this specific case the loss being flagged is between
the monitor (tap) point and the argus sensor rather than in the monitored link.
With the current loss metrics I don't know how I would extract that specific
loss (not that it can't be done, I just don't know how :-)). I do agree it is
a valuable statistic to have as it points (on an ongoing basis as opposed to 
one time testing of sensor capacity which should be done as well) to a sensor
error. As such I think the metric should go in the man records as it is sensor
rather than link related.


> So, lets talk about how we can estimate the amount of unobserved  traffic.   TCP
> provides you with a number of possibilities for knowing what the offered load is/was,
> i.e. the amount of traffic that was actually transmitted.  The most reliable is the total
> number of bytes.  This is derived by differencing the closing sequence number
> and the initial base sequence number, adjusting for rollover.  This is an excellent
> number, as it tells you the exact number of bytes reliably transmitted (Br).
> 
> Now this is the TCP bytes. Argus tracks this stat, and reports today the TCP bytes
> observed (Bo).  This number is the TCP bytes for the all transmitted packets,
> original data (Od) and retransmissions (Rb).   If you can compare Od with Br,
> you will realize how many bytes you didn't see.  Now how many packets you
> didn't see will have to be estimated.
> 

	What wireshark is doing is seeing that the monitored end point acked
a packet that the sensor didn't see. That implies that the monitored link saw
the packet but for some reason, probably sensor link loss, the sensor didn't
see the packet. That is going to happen sometimes simply because the sensor 
link after the tap is UDP, and the bit error rate of the underlying link is
sometimes going to cause errors in valid packets (not all that often we hope,
but sometimes even on passive taps with fibre, much more often on either or 
both of copper links and active taps or span ports) that will cause the sensor 
to drop what is a valid packet on the monitored link. It would be useful for 
management purposes to have a count of how many the sensor sees (or more 
correctly didn't see :-)) in the man record. Its then relatively easy to have 
a script running on the sensor (or in argus itself if desired, but I think a 
script is probably the better bet, since not everyone is this paranoid even 
though they chould be :-), that will alert if the loss number gets too high 
indicating that a problem has occurred in the sensor link and that a human 
needs to take a look at it and fix  to maintain accuracy. If you don't need 
the accuracy you can ignore this as long as it doesn't get too large, but if 
you want it or need it the capability is there (as is the data even if you had
been ignoring it :-)). 

Peter Van Epp