Another vote for packet drop detection

Carter Bullard carter at qosient.com
Mon Jan 30 11:23:17 EST 2012


Hey Peter,
Yes, argus already does what you are suggesting, tracking the bi-directional
state of TCP to determine what is lost, what is retransmitted, what is not seen, etc…
So, we can generate the data needed to drive this analytic, I believe. 

Argus can't do what I imagine wireshark is doing.  I suspect that wireshark is storing
all the sequence numbers seen on a specific TCP, and when an ACK shows up, it
then searches its list to see that it didn't see a sequence of bytes in the window.
Even wireshark can't know the number of packets that are missing, I assume that it
assumes that all of its missing bytes are in a minimum number of packets.

This type of algorithm can't perform at line rate (1Gbps +) while tracking 1M+ 
concurrent connections, which is a performance target for argus, so I'll try to
derive a metric from our existing loss algorithm.   We should be able to report 
a reliable "unobserved bytes" metric.

Argus, however, IMHO, is not capable of realizing that a specific dynamic in a
specific flow should be tallied and reported as a sensor loss metric to be reported
in a MAR record.  There just isn't enough information, at the time a packet shows
up, or when a flow record is exported, to be able to unambiguously declare that
the "haven't seen this packet" statistic is really sensor loss (packet may actually
show up 1 nSec after we declared that the packet didn't show up, or we don't
know what the current libpcap drop rate is).

So,  I will recommend that the sensor metric should be derived by an an argus client
looking at a specific argus data stream.  It would look at the TCP gaps, (and any other
connection oriented protocol's gap metric, like RTP or UDT that we should also add),
compare it with the libpcap packet drop stat in the MAR status, and then figure out
if the gaps are hidden loss.

I would like to report the new TCP flow dynamic "gaps" in the TCP performance
DSR, and then design a client that will read those records and figure it out.  Gaps
would be "TCP traffic not seen", which I'm hoping can be reported in bytes.

So if I report TCP gaps, which I think is a good new metric, can you guys build a
client that will take that indication and figure it out?

Carter


On Jan 29, 2012, at 4:55 PM, Peter Van Epp wrote:

> On Sun, Jan 29, 2012 at 08:56:09AM -0500, Carter Bullard wrote:
>> OK, I think that we are already doing all that is needed in argus to report on
>> suspicion of infrastructure loss.  I have written about this for many years, and
>> I'm glad that we're talking about this topic now.  We have done so much work
>> to make sure that we're pretty accurate in reporting loss, and many have put
>> Argus through the ringer on this.  (if you think you see a bug in simply reporting
>> loss, please send to the list a packet trace that demonstrates that, and I'll fix
>> it).  If Argus is doing a good job estimating loss, then we're talking about only
>> adding another metric to differentiate unobserved traffic vs loss, based on
>> your criteria.
>> 
> 
> 	Correct, but in this specific case the loss being flagged is between
> the monitor (tap) point and the argus sensor rather than in the monitored link.
> With the current loss metrics I don't know how I would extract that specific
> loss (not that it can't be done, I just don't know how :-)). I do agree it is
> a valuable statistic to have as it points (on an ongoing basis as opposed to 
> one time testing of sensor capacity which should be done as well) to a sensor
> error. As such I think the metric should go in the man records as it is sensor
> rather than link related.
> 
> 
>> So, lets talk about how we can estimate the amount of unobserved  traffic.   TCP
>> provides you with a number of possibilities for knowing what the offered load is/was,
>> i.e. the amount of traffic that was actually transmitted.  The most reliable is the total
>> number of bytes.  This is derived by differencing the closing sequence number
>> and the initial base sequence number, adjusting for rollover.  This is an excellent
>> number, as it tells you the exact number of bytes reliably transmitted (Br).
>> 
>> Now this is the TCP bytes. Argus tracks this stat, and reports today the TCP bytes
>> observed (Bo).  This number is the TCP bytes for the all transmitted packets,
>> original data (Od) and retransmissions (Rb).   If you can compare Od with Br,
>> you will realize how many bytes you didn't see.  Now how many packets you
>> didn't see will have to be estimated.
>> 
> 
> 	What wireshark is doing is seeing that the monitored end point acked
> a packet that the sensor didn't see. That implies that the monitored link saw
> the packet but for some reason, probably sensor link loss, the sensor didn't
> see the packet. That is going to happen sometimes simply because the sensor 
> link after the tap is UDP, and the bit error rate of the underlying link is
> sometimes going to cause errors in valid packets (not all that often we hope,
> but sometimes even on passive taps with fibre, much more often on either or 
> both of copper links and active taps or span ports) that will cause the sensor 
> to drop what is a valid packet on the monitored link. It would be useful for 
> management purposes to have a count of how many the sensor sees (or more 
> correctly didn't see :-)) in the man record. Its then relatively easy to have 
> a script running on the sensor (or in argus itself if desired, but I think a 
> script is probably the better bet, since not everyone is this paranoid even 
> though they chould be :-), that will alert if the loss number gets too high 
> indicating that a problem has occurred in the sensor link and that a human 
> needs to take a look at it and fix  to maintain accuracy. If you don't need 
> the accuracy you can ignore this as long as it doesn't get too large, but if 
> you want it or need it the capability is there (as is the data even if you had
> been ignoring it :-)). 
> 
> Peter Van Epp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120130/7a4d612a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120130/7a4d612a/attachment.bin>


More information about the argus mailing list