Another vote for packet drop detection

Tue Jan 31 16:18:10 EST 2012

	This should solve the need I think. With the gap metric it should be
possible to produce a sensor (as opposed to monitored link) loss count for TCP.
In theory that should uncover over time an increase in sensor loss (there is
always going to be some sensor loss just because of the UDP nature of the link
after the tap point) that indicates that the user needs to do some further 
diagnostics on the link (checking all the various error counts, perhaps putting
a sniffer in line, or getting out the optical power meter to check levels etc.)
to correct it. If you don't know you are having loss due to lack of monitoring
it is hard to avoid it. Thanks!

Peter Van Epp

On Tue, Jan 31, 2012 at 11:28:04AM -0500, Carter Bullard wrote:
> Gentle People,
> OK, so I now have all the metrics working in place to report "unobserved bytes" for TCP.
> 
> This involves data elements in the ArgusTCPObject, which to date have not been
> accessible using the ra* programs.  Just haven't had a need to expose them yet.
> 
> We have had in each TCP record the base sequence number, last seq number seen,
> last ACK value seen, total TCP bytes observed, total windows seen, along with
> the TCP connection establishment times, retrans tally, last window size advertised,
> the flags and any window shift option values.  From these values, we can derive
> the number of bytes not observed, something like (((ACK - 1) - BaseSeq) - TCP obs),
> for each side.  I have a lot of testing to do, but the basic support is now in.
> 
> There was a bug in prior argi, in how we were tallying one of the critical elements to
> our algorithm, sequence number turnover,  so earlier versions of argus data are not
> going to be able to always provide this metric.   If there isn't sequence number
> roll-over, older data will be able to provide the values.  We'll know from the data
> if there is a bug, so we can generate good data, just not from every TCP connection.
> 
> To allow for tracking of these metrics through all types of data aggregation that we
> may encounter, I need to modify the ArgusTCPObjectMetrics structure, so that the
> sequence roll-over counter can be greater than 4GB.  Currently its an unsigned int,
> and I need for it to be an unsigned long long.    I will not change this until we get
> to the next minor version change, which will be the 3.1.0 release.
> 
> There are several ways of reporting the "TCP unobserved bytes" metric.   We
> can provide a new printable field, we can call it "[sd]gap" ?   Also,  I can expose
> all the TCP metrics, however, I'll need names for them all.
> 
> Using the packet traces from the wireshark site, we are accurately tracking the
> loss that they describe, so I'd say its working.  Just need some testing.
> 
> Please offer suggestions, and we'll make it official.
> 
> Carter
> 
> 
> 
> On Jan 30, 2012, at 9:14 PM, Peter Van Epp wrote:
> 
> > On Mon, Jan 30, 2012 at 11:23:17AM -0500, Carter Bullard wrote:
> >> Hey Peter,
> >> Yes, argus already does what you are suggesting, tracking the bi-directional
> >> state of TCP to determine what is lost, what is retransmitted, what is not seen, etc?
> >> So, we can generate the data needed to drive this analytic, I believe. 
> >> 
> >> Argus can't do what I imagine wireshark is doing.  I suspect that wireshark is storing
> >> all the sequence numbers seen on a specific TCP, and when an ACK shows up, it
> >> then searches its list to see that it didn't see a sequence of bytes in the window.
> >> Even wireshark can't know the number of packets that are missing, I assume that it
> >> assumes that all of its missing bytes are in a minimum number of packets.
> >> 
> > 
> > 	Without having looked at the wireshark code (only the suggested fix
> > which I suspect is indeed the tip of the iceburg :-)), I expect you are 
> > correct. I saw the addition of a single new variable to track acked state 
> > on the monitored link separate from that on the wireshark link and thought 
> > that shouldn't be to expensive. However you are correct, wireshark is saving
> > all the packets (and a limited by buffer size number of those) and thus has
> > much more data than argus can keep. 
> > 	The packet trace(s) that demonstrate the problem in wireshark are 
> > available from the bug report at
> > 
> > https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6081
> > 
> > in the test traces attachment I believe. I think from the commentary that one
> > of those is a trace of a successful (on the monitored link) http connection
> > with one packet removed from the pcap to simulate loss on the tapped link 
> > that didn't occur on the monitored link which should demonstrate the condition
> > 
> > [ snip ??]
> 
> > Peter
> > 
>