Another vote for packet drop detection

Charles Smutz csmutz at masonlive.gmu.edu
Sat Jan 28 12:55:41 EST 2012


Carter,

I'd like to put in another plug for packet drop detection in argus. 
There are many people who could use this. In many cases, people are 
running sensors where there various places were packet loss is reported 
are 0 (pcap drop, ifconfig drop, ethtool -S) but there is still loss (in 
tapping infrastructure, link overflows not reported by NIC, etc).

Note that I'm concerned about loss that occurs in the network monitor, 
not loss in the network. We all know packets are lost and that's dandy, 
but I want to make sure my network monitor is seeing everything 
traversing the network (and if it happens to see some things twice 
because normal drop occurs after my visibility point--that's the least 
of my worries--especially for network flow data). If packets do traverse 
network and I don't seem them, I consider that a very bad thing. This 
can happen in places, as I mentioned before, that are not reported and 
so often go unnoticed.

I've discussed methods for doing this in this blog post:
http://smusec.blogspot.com/2010/06/flushing-out-leaky-taps.html

Wireshark seems to have the best capabilities for doing this of any 
network monitoring tool that I know of, but as many have pointed out, 
these counters are actually often inaccurate :(

In addition to this thread, see 
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6081

Note that both Gyorgy and I have been clear in exactly what we're 
looking for and have even provided solid pcap examples.

What many want is to be able to have networking monitoring tool report 
any packets (or tcp streams if you need to be pedantic--you would be 
guessing on number of packets) where the network monitor saw the stream 
data ACK'd but didn't see the stream data itself. In that case I can 
infer with strong confidence that the endpoint thinks he saw data that 
the monitor didn't. In the vast majority of cases, that will be because 
of network visibility or a tapping issue. Despite the inherent 
limitations, this sort of analysis is extremely valuable for quantifying 
and debugging loss in networking monitoring equipment (especially places 
were the debugger can't see reported loss or equipment reporting loss 
lies). Argus doesn't need to try to figure out were the loss occurs, it 
just needs to be able to detect loss through tcp "ack data not seen" 
inference. The user can compare this to other places where he can 
quantify loss--the most interesting being when everything else is zero 
(usually means bad taps, etc).

I'm not quite sure how easy this would be to implement in argus, and 
certainly it would only work in cases were you see (or think you should 
be seeing) bi-directional data. If argus could do this, possibly as mar 
record stat, that would make me a very happy man. In my opinion this 
capability fits within argus at least as well, if not better, than full 
content analysis tools like wireshark because we're just dealing with 
layer 4 metadata here--no need to look at content for this. Having argus 
do this would make it easy to alert on and would allow me to debug stuff 
that I can't do very easily now. This capability would be useful for 
people who do go to great lengths to do things right (good taps, good 
sensors, etc) but who need to verify that everything is working well and 
alert when it isn't.

As always, thanks for a great tool,

Charles




On 1/28/2012 12:00 PM, argus-info-request at lists.andrew.cmu.edu wrote:
> Send Argus-info mailing list submissions to
> 	argus-info at lists.andrew.cmu.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://lists.andrew.cmu.edu/mailman/listinfo/argus-info
> or, via email, send a message with subject or body 'help' to
> 	argus-info-request at lists.andrew.cmu.edu
>
> You can reach the person managing the list at
> 	argus-info-owner at lists.andrew.cmu.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Argus-info digest..."
>
>
> Today's Topics:
>
>     1. Re:  Detect packet drops (Carter Bullard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 27 Jan 2012 14:16:47 -0500
> From: Carter Bullard<carter at qosient.com>
> Subject: Re: [ARGUS] Detect packet drops
> To: elof2 at sentor.se
> Cc: Argus Development<argus-info at lists.andrew.cmu.edu>
> Message-ID:<FC2964C1-C541-4554-844A-48BB224D84A8 at qosient.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hey /Elof,
> OK, as I have mentioned before, we do distinguish between 'skipped' sequence numbers,
> out of order sequence numbers, and retransmitted numbers (data and asks)  The duplicates,
> such as multiple copies of the exact same packet, is detectable and I put code in to do
> this, although I don't have any packet files that have the conditions that you describe to
> verify if they are correct or not, so I haven't finished the support.
>
> The problems in guaranteeing that you can count every drop, in this case for TCP are these.
>
> Because TCP is reliable, there aren't going to be any gaps, if you see all the traffic.  If you
> see gaps, it is only because the sensor isn't seeing all the packets, and  you have to try
> to figure out why.  Did the network not pass the traffic past your sensor, because of asymmetric
> routing, or stripping, load balancing, or path failure, or did the sensor drop a packet that
> actually came by.  This is an impossible thing to know, unless you can find a pattern for
> the loss.
>
> If the sensor is watching TCP traffic at a point in the network prior to the loss point,
> the sensor will see retransmissions, multiple instances of the same sequence numbers.
> The sender will retransmit traffic because the receiver states that he hasn't seen the traffic.
> But there is a race condition, where the receiver receives the packet late.  No loss will have
> occurred but there are retransmissions.
>
> If the sensor is past the loss point, you won't see any drops, because TCP is reliable.  You will
> see out of order packets.  So out of order is an indication of loss?  Not necessarily, the
> network can deliver them out of order.  The time domain for the out-of-order is the best
> way to tell what is going on.
>
> Because there are generally more than one point where loss can occur, your sensor will see
> all sorts of weird combinations of the above behavior.
>
> The best way to see all indications of loss is to look at the ACK behavior from the receiver.
> Selective ACK advertisements are the best way to track loss, as you'll get a fine grain reporting
> of what the receiver didn't receive.  Without selective ACK, you don't know how many packets
> in a window were lost, you just know that at least 1 was lost.
>
> Argus is doing what you are asking for.  If you want specific counters to try to get more info, I
> can report them.  But outside of what Argus is already doing,  I'm thinking is not possible
> to detect.
>
> So, tell me what counters you want.  In your example Argus is already doing better than wireshark.
>
> But I would also like to see the discrepancy between Argus and wireshark.  Argus gives a drop
> count, regardless of how we calculate it.  How does it compare to wiresharks?
>
> So what's the big deal ?  Are you so into the QoS part of this that each packet lost is important
> to your analysis?
>
>
> Carter
>
> On Jan 26, 2012, at 5:37 AM, elof2 at sentor.se wrote:
>
>> On Wed, 25 Jan 2012, Peter Van Epp wrote:
>>> On Wed, Jan 25, 2012 at 02:02:08PM +0100, elof2 at sentor.se wrote:
>>>> Any more thoughts or progress with this?
>>>>
>>>> I just realised that I can't even rely on Wireshark for an estimate
>>>> of dropped packets, since Wireshark's Expert Info "ACKed lost
>>>> segment" tag out-of-order FIN-packets as "ACKed lost segment".
>>>>
>>>> What I'm looking for is not a 100% accurate system to count every
>>>> missing packet (which is impossible to determine), but a flag on
>>>> each session that argus know is missing one or more packets.
>>>> Just like the flag for retransmission doesn't say how many
>>>> retransmissions there were in a tcp flow.
>>> 	Checking the pcap reported loss rate (its in the man records which
>>> you have to enable to see these days) will give you an indication, although
>>> it is only one of the several ways your sensor can be losing packets, is one
>>> good indication of how your sensor is doing. There is an explaination of a
>>> number of the possible (and usually invisible) loss points in a sensor on
>>> Carter's web site at http://www.qosient.com/argus/sensorPerformance.shtml as
>>> well.
>>
>> Hi Peter.
>> Thanks for your input.
>>
>> Ah, didn't know about the hidden pcap drop counters. I will take a look at it.
>>
>> However... Even though I can see the pcap drop count, I still think it would be nice if argus could tag individual flows where it has detected gaps.
>> The tag would give us argus users a notification that not all traffic is monitored 100%. An informative tag just like the out-of-order tag or ECN tag.
>>
>> I now realise that my suggestion of having tags like "dropped externally" and "dropped internally" is not feasable, since there's no way to correlate the pcap drop counter to specific flows, so ignore this.
>>
>> Apart from simply being informed that the monitored traffic is not 100%, I would also very much like to be able to determine if the drops occur outside of the sensor, i.e. the switch drop lots of packets while the sensor drop nothing.
>> With the tag above, and a pcap-drop-counter in the argus man-records it should be easier to spot that external drops occur.
>> (naturally, if you have both external drops and internal drops, it will be hard to investigate, but that's always the case. If I'm sure I have 0 drops within my sniffing machine, then all flows tagged with gaps must be due to drops in the external switch or tap (or faulty DAG/DAC drivers that doesn't report their own drop count, but that is a completely different matter).
>>
>>
>>> 	Comparing the RMON traffic counts reported by the switch feeding your
>>> sensor against the argus counts is another way although syncronizing the two
>>> counts can be exciting :-). Both of these only indicate loss of data that makes
>>> it as far as your sensor of course and isn't an indication of loss else where
>>> in the path but thats a start ...
>> Hehe, this is not possible since in many cases the SPAN port is not managed by me. I just manage the sensor receiving the mirrored traffic, but it is someone else who has setup the SPAN configuration.
>> So diffing the reported drop-numbers is practically not feasable.
>>
>>> 	As well using something like tcpreplay from a pcap file with suitable
>>> hardware (which can get very hard at high speed of course :-)) feeding in to
>>> your sensor can give you a known input traffic pattern to estimate sensor loss
>>> as well.
>> Now you're rather talking about detecting local sensor loss. What I'm primarily asking for is a way to easily detect that there are external packet loss.
>>
>> Currently I'm sniffing e.g. 100 000 packets with tcpdump, making sure nothing is dropped locally. In this case it took 3 seconds to gather 100 000 packets. I scp the pcap file to a machine running Wireshark. I open up the "Expert Info Composite" and look at "ACKed lost segment" and "Previous segment lost".
>> In an environment where the traffic is mirrored correctly, these two counters give me an estimate as to how many gaps there are in the tcp flows in the pcap file (disregarding a couple of false positives at capture startup).
>> ...that is, I can see if the people feeding me mirrored traffic have problems in their end.
>>
>> This procedure is quite tiresome. Also, it is unreliable when the mirrored packets are received out of order (common in redundant/loadbalanced environments), then Wireshark will tag packets as lost even though they exist.
>>
>> /Elof
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: smime.p7s
> Type: application/pkcs7-signature
> Size: 4367 bytes
> Desc: not available
> Url : https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120127/e6e10542/attachment-0001.bin
>
> ------------------------------
>
> _______________________________________________
> Argus-info mailing list
> Argus-info at lists.andrew.cmu.edu
> https://lists.andrew.cmu.edu/mailman/listinfo/argus-info
>
>
> End of Argus-info Digest, Vol 77, Issue 42
> ******************************************
>
> .
>




More information about the argus mailing list