Detect packet drops

Fri Jan 27 14:16:47 EST 2012

Hey /Elof,
OK, as I have mentioned before, we do distinguish between 'skipped' sequence numbers,
out of order sequence numbers, and retransmitted numbers (data and asks)  The duplicates,
such as multiple copies of the exact same packet, is detectable and I put code in to do
this, although I don't have any packet files that have the conditions that you describe to
verify if they are correct or not, so I haven't finished the support.

The problems in guaranteeing that you can count every drop, in this case for TCP are these.

Because TCP is reliable, there aren't going to be any gaps, if you see all the traffic.  If you
see gaps, it is only because the sensor isn't seeing all the packets, and  you have to try
to figure out why.  Did the network not pass the traffic past your sensor, because of asymmetric
routing, or stripping, load balancing, or path failure, or did the sensor drop a packet that
actually came by.  This is an impossible thing to know, unless you can find a pattern for
the loss.

If the sensor is watching TCP traffic at a point in the network prior to the loss point,
the sensor will see retransmissions, multiple instances of the same sequence numbers.
The sender will retransmit traffic because the receiver states that he hasn't seen the traffic.
But there is a race condition, where the receiver receives the packet late.  No loss will have
occurred but there are retransmissions.

If the sensor is past the loss point, you won't see any drops, because TCP is reliable.  You will
see out of order packets.  So out of order is an indication of loss?  Not necessarily, the
network can deliver them out of order.  The time domain for the out-of-order is the best
way to tell what is going on.   

Because there are generally more than one point where loss can occur, your sensor will see
all sorts of weird combinations of the above behavior.

The best way to see all indications of loss is to look at the ACK behavior from the receiver.
Selective ACK advertisements are the best way to track loss, as you'll get a fine grain reporting
of what the receiver didn't receive.  Without selective ACK, you don't know how many packets
in a window were lost, you just know that at least 1 was lost.

Argus is doing what you are asking for.  If you want specific counters to try to get more info, I
can report them.  But outside of what Argus is already doing,  I'm thinking is not possible
to detect.

So, tell me what counters you want.  In your example Argus is already doing better than wireshark.

But I would also like to see the discrepancy between Argus and wireshark.  Argus gives a drop
count, regardless of how we calculate it.  How does it compare to wiresharks?

So what's the big deal ?  Are you so into the QoS part of this that each packet lost is important
to your analysis?

Carter

On Jan 26, 2012, at 5:37 AM, elof2 at sentor.se wrote:

> 
> On Wed, 25 Jan 2012, Peter Van Epp wrote:
>> On Wed, Jan 25, 2012 at 02:02:08PM +0100, elof2 at sentor.se wrote:
>>> Any more thoughts or progress with this?
>>> 
>>> I just realised that I can't even rely on Wireshark for an estimate
>>> of dropped packets, since Wireshark's Expert Info "ACKed lost
>>> segment" tag out-of-order FIN-packets as "ACKed lost segment".
>>> 
>>> What I'm looking for is not a 100% accurate system to count every
>>> missing packet (which is impossible to determine), but a flag on
>>> each session that argus know is missing one or more packets.
>>> Just like the flag for retransmission doesn't say how many
>>> retransmissions there were in a tcp flow.
>> 
>> 	Checking the pcap reported loss rate (its in the man records which
>> you have to enable to see these days) will give you an indication, although
>> it is only one of the several ways your sensor can be losing packets, is one
>> good indication of how your sensor is doing. There is an explaination of a
>> number of the possible (and usually invisible) loss points in a sensor on
>> Carter's web site at http://www.qosient.com/argus/sensorPerformance.shtml as
>> well.
> 
> 
> Hi Peter.
> Thanks for your input.
> 
> Ah, didn't know about the hidden pcap drop counters. I will take a look at it.
> 
> However... Even though I can see the pcap drop count, I still think it would be nice if argus could tag individual flows where it has detected gaps.
> The tag would give us argus users a notification that not all traffic is monitored 100%. An informative tag just like the out-of-order tag or ECN tag.
> 
> I now realise that my suggestion of having tags like "dropped externally" and "dropped internally" is not feasable, since there's no way to correlate the pcap drop counter to specific flows, so ignore this.
> 
> Apart from simply being informed that the monitored traffic is not 100%, I would also very much like to be able to determine if the drops occur outside of the sensor, i.e. the switch drop lots of packets while the sensor drop nothing.
> With the tag above, and a pcap-drop-counter in the argus man-records it should be easier to spot that external drops occur.
> (naturally, if you have both external drops and internal drops, it will be hard to investigate, but that's always the case. If I'm sure I have 0 drops within my sniffing machine, then all flows tagged with gaps must be due to drops in the external switch or tap (or faulty DAG/DAC drivers that doesn't report their own drop count, but that is a completely different matter).
> 
> 
>> 	Comparing the RMON traffic counts reported by the switch feeding your
>> sensor against the argus counts is another way although syncronizing the two
>> counts can be exciting :-). Both of these only indicate loss of data that makes
>> it as far as your sensor of course and isn't an indication of loss else where
>> in the path but thats a start ...
> 
> Hehe, this is not possible since in many cases the SPAN port is not managed by me. I just manage the sensor receiving the mirrored traffic, but it is someone else who has setup the SPAN configuration.
> So diffing the reported drop-numbers is practically not feasable.
> 
>> 	As well using something like tcpreplay from a pcap file with suitable
>> hardware (which can get very hard at high speed of course :-)) feeding in to
>> your sensor can give you a known input traffic pattern to estimate sensor loss
>> as well.
> 
> Now you're rather talking about detecting local sensor loss. What I'm primarily asking for is a way to easily detect that there are external packet loss.
> 
> Currently I'm sniffing e.g. 100 000 packets with tcpdump, making sure nothing is dropped locally. In this case it took 3 seconds to gather 100 000 packets. I scp the pcap file to a machine running Wireshark. I open up the "Expert Info Composite" and look at "ACKed lost segment" and "Previous segment lost".
> In an environment where the traffic is mirrored correctly, these two counters give me an estimate as to how many gaps there are in the tcp flows in the pcap file (disregarding a couple of false positives at capture startup).
> ...that is, I can see if the people feeding me mirrored traffic have problems in their end.
> 
> This procedure is quite tiresome. Also, it is unreliable when the mirrored packets are received out of order (common in redundant/loadbalanced environments), then Wireshark will tag packets as lost even though they exist.
> 
> /Elof

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120127/e6e10542/attachment.bin>