Detect packet drops

elof2 at sentor.se elof2 at sentor.se
Wed Jan 25 08:02:08 EST 2012


Hi Carter!

Any more thoughts or progress with this?

I just realised that I can't even rely on Wireshark for an estimate of 
dropped packets, since Wireshark's Expert Info "ACKed lost segment" tag 
out-of-order FIN-packets as "ACKed lost segment".

What I'm looking for is not a 100% accurate system to count every missing 
packet (which is impossible to determine), but a flag on each session that 
argus know is missing one or more packets.
Just like the flag for retransmission doesn't say how many retransmissions 
there were in a tcp flow.

Since tcp is more or less always present in all generic sniffing 
scenarios, the new function in argus to detect packet drops would only 
need to bother with tcp.
To eliminate false-positives, I think you should only monitor tcp packets 
that don't include SYN or FIN (i.e. only analyze ACK and PSH ACK packets).
Experience has shown me that complex reduntant/loadbalanced solutions 
often SPAN the traffic incorrectly, like always receiving the SYNACK 
before the SYN, or in my current case, receiving the final ACK before the 
FINACK.

If argus can tag flows with missing packets, the user can detect SPAN 
problems (dropped packets outside the machine or if the machine itself 
can't keep up with the received bandwidth). Also, the column for 
retransmissions/out-of-order in the 'proto' field could reflect just that. 
Currently the manual says drop OR retransmission.


I don't know if there is an API so that argus is always informed if the 
driver/kernel dropped any packets. If so, perhaps there should be 
different tags, like "dropped externally" and "dropped internally".


Also, Carter and I had a mail thread going regarding the possibility to 
detect pure duplicates, another common faulty SPAN setup where every 
packet is copied twice to the destination port.
Currently argus is tagging duplicate packets as retransmissions. This is 
not true, it is just the result of faulty SPAN configuration in the 
network.


If argus were to have both the packet drop detection as well as the 
duplicate detection, both of these are categorized as SPAN issues. 
Therefore I think you should add yet another column in the 'proto' field 
in ra output, since SPAN issues have nothing to do with the current:
  argus records themselves
  protocol encapsulation
  icmp events
  retransmission/out-of-order
  window closure/supression
  ecn
  fragmentation
  IP options

Perhaps the out-of-order tags should be moved from their currect column to 
this new SPAN-column since they too have more to do with the external 
environment?

You could then have:
            *         -  Both Src and Dst have duplicated packets
            s         -  Src see duplicated packets
            d         -  Dst see duplicated packets
            %         -  Both flow directions are missing packet(s)
            x         -  Src->Dst flow missing packet(s)
            y         -  Dst->Src flow missing packet(s)
            &         -  Both Src and Dst packet out of order
            i         -  Src packets out of order
            r         -  Dst packets out of order
            ?         -  Two or more of the designators above, (like s AND r)
                *     -  Both Src and Dst retransmission
                s     -  Src retransmissions
                d     -  Dst retransmissions

            ^         =  new column in ra output
                ^     =  existing column in ra output

Any thoughts?

/Elof


On Wed, 26 Oct 2011, Carter Bullard wrote:

> I would give it a try none the less.  As I said argus does differentiate between TCP sequence number loss, out of order packets  and retransmission, so if the TCP doesn't have retansmissions, but the sensor doesn't see all the packets, argus will report loss.  The hard part is undestanding that the loss is due to sensor loss rather than data path loss.
>
> Because argus provides TCP sequence numbers you can / could see that the total bytes observed vs the bytes successfully transmitted are different.
>
> I think you can do a first step guess iwith the existing tools. The argus data has most of what you need to figure it out, not sure you can do it just with the fields we print out.
>
> Carter
>
> On Oct 26, 2011, at 7:19 AM, elof2 at sentor.se wrote:
>
>>
>> Hi Carter!
>>
>> 1.
>> Hmmm, the manual says that the *loss fields counts packet loss OR the amount of retransmissions.
>> Since I'm only interested in detecting drops I don't see how this help me.
>>
>> I know it is hard to detect drops when you don't have the original data to compare with, but it should be possible to get a rough drop estimate by analysing e.g. tcp counters. The easiest way would be to look at a tcp stream and note every gap there is. Gaps = packet loss.
>>
>> If argus could distinguish between Loss and Retransmission, one could more easily see the amount of "SPAN drops" and amount of retransmissions on the wire.
>>
>> (PS. From a discussion some year ago, in a perfect world, the retransmission counter should also be split in two: One counter for tcp retransmissions and one counter for duplicate packets, i.e. when the sniffer get two copies of the exact same packet. The latter case is not really retransmissions but rather a faulty SPAN setup.)
>>
>>
>> Oh well, I know all of this is quite obscure, hard to fix and would inflict too many changes in the already existing *loss fields and in the protocol flags field, so I don't expect argus to handle things as perfectly as I would like.
>>
>> Therefor my original question remain:
>> Is there a commandline tool that show me a rough estimate of drops in the sniffed traffic?
>>
>>
>>
>> 2.
>> I found yet a typo in the manual. It says:
>>           psloss      percent source pkts retransmitted or dropped.
>>           pdloss      percent destination pkts retransmitted or dropped.
>>    it should be
>>           sploss
>>           dploss
>>
>>
>> 3.
>> I'm just curious... How do the *loss counters in argus work?
>> If I have a single packet, a TCP SYN, this is registered by ra -Zb as:
>>
>> spkts dpkts sloss dloss state
>> 1     0     1     0     S_
>>
>> Why is sloss=1 when only one packet exist?
>>
>> /Elof
>>
>>
>>
>> On Tue, 25 Oct 2011, Carter Bullard wrote:
>>> You can print % loss for a number of flow types, TCP, RTP, ESP, but if you aggregate all the flow records to try to get a singular loss ratio for the whole "wire", the way aggregation is done, we may not retain loss, if the protocols merged don't all have loss metrics.
>>> This is a hard detection problem, but you should be able to detect large %loss situations with existing tools.  If you print loss as a percent:
>>>  racluster -r file -m proto -s stime dur sploss dploss - tcp
>>> Do you get anything that looks useful?
>>>
>>> Carter
>>> Carter Bullard, QoSient, LLC
>>> 150 E. 57th Street Suite 12D
>>> New York, New York 10022
>>> +1 212 588-9133 Phone
>>> +1 212 588-9134 Fax
>>>
>>> On Oct 25, 2011, at 10:17 AM, elof2 at sentor.se wrote:
>>>
>>>> Hi Carter and list!
>>>>
>>>> Is there any way to easily detect loss in SPAN-traffic?
>>>>
>>>> If I mirror two 1 Gbps full-duplex ports to a 1 Gbps SPAN port, in theory the switch could try to copy 4 Gbps onto it, resulting in dropped packets.
>>>>
>>>> The sniffer machine receiving the mirrored traffic could be heavily loaded and drop packets.
>>>>
>>>> In protocols such as TCP, these drops are detectable, due to gaps in the sequence counters.
>>>>
>>>> Generating a pcap-file, scp:ing it to a machine running wireshark and then looking at the expert info is such a hassle. I'm looking for a commandline tool that show me when packets are missing (by printing a * for every missed packet) or giving me an estimated ratio of drops per minute.
>>>>
>>>> Is there such a tool?
>>>>
>>>>
>>>>
>>>>
>>>> I'm guessing that argus can't help me, since it doesn't distinguish between loss and retransmissions in the 'flgs' field:
>>>>              *     -  Both Src and Dst loss/retransmission
>>>>              s     -  Src loss/retransmissions
>>>>              d     -  Dst loss/retransmissions
>>>>
>>>> /Elof
>>>>
>>>>
>>>
>>
>



More information about the argus mailing list