Detect packet drops
elof2 at sentor.se
elof2 at sentor.se
Wed Jan 25 08:02:08 EST 2012
Hi Carter!
Any more thoughts or progress with this?
I just realised that I can't even rely on Wireshark for an estimate of
dropped packets, since Wireshark's Expert Info "ACKed lost segment" tag
out-of-order FIN-packets as "ACKed lost segment".
What I'm looking for is not a 100% accurate system to count every missing
packet (which is impossible to determine), but a flag on each session that
argus know is missing one or more packets.
Just like the flag for retransmission doesn't say how many retransmissions
there were in a tcp flow.
Since tcp is more or less always present in all generic sniffing
scenarios, the new function in argus to detect packet drops would only
need to bother with tcp.
To eliminate false-positives, I think you should only monitor tcp packets
that don't include SYN or FIN (i.e. only analyze ACK and PSH ACK packets).
Experience has shown me that complex reduntant/loadbalanced solutions
often SPAN the traffic incorrectly, like always receiving the SYNACK
before the SYN, or in my current case, receiving the final ACK before the
FINACK.
If argus can tag flows with missing packets, the user can detect SPAN
problems (dropped packets outside the machine or if the machine itself
can't keep up with the received bandwidth). Also, the column for
retransmissions/out-of-order in the 'proto' field could reflect just that.
Currently the manual says drop OR retransmission.
I don't know if there is an API so that argus is always informed if the
driver/kernel dropped any packets. If so, perhaps there should be
different tags, like "dropped externally" and "dropped internally".
Also, Carter and I had a mail thread going regarding the possibility to
detect pure duplicates, another common faulty SPAN setup where every
packet is copied twice to the destination port.
Currently argus is tagging duplicate packets as retransmissions. This is
not true, it is just the result of faulty SPAN configuration in the
network.
If argus were to have both the packet drop detection as well as the
duplicate detection, both of these are categorized as SPAN issues.
Therefore I think you should add yet another column in the 'proto' field
in ra output, since SPAN issues have nothing to do with the current:
argus records themselves
protocol encapsulation
icmp events
retransmission/out-of-order
window closure/supression
ecn
fragmentation
IP options
Perhaps the out-of-order tags should be moved from their currect column to
this new SPAN-column since they too have more to do with the external
environment?
You could then have:
* - Both Src and Dst have duplicated packets
s - Src see duplicated packets
d - Dst see duplicated packets
% - Both flow directions are missing packet(s)
x - Src->Dst flow missing packet(s)
y - Dst->Src flow missing packet(s)
& - Both Src and Dst packet out of order
i - Src packets out of order
r - Dst packets out of order
? - Two or more of the designators above, (like s AND r)
* - Both Src and Dst retransmission
s - Src retransmissions
d - Dst retransmissions
^ = new column in ra output
^ = existing column in ra output
Any thoughts?
/Elof
On Wed, 26 Oct 2011, Carter Bullard wrote:
> I would give it a try none the less. As I said argus does differentiate between TCP sequence number loss, out of order packets and retransmission, so if the TCP doesn't have retansmissions, but the sensor doesn't see all the packets, argus will report loss. The hard part is undestanding that the loss is due to sensor loss rather than data path loss.
>
> Because argus provides TCP sequence numbers you can / could see that the total bytes observed vs the bytes successfully transmitted are different.
>
> I think you can do a first step guess iwith the existing tools. The argus data has most of what you need to figure it out, not sure you can do it just with the fields we print out.
>
> Carter
>
> On Oct 26, 2011, at 7:19 AM, elof2 at sentor.se wrote:
>
>>
>> Hi Carter!
>>
>> 1.
>> Hmmm, the manual says that the *loss fields counts packet loss OR the amount of retransmissions.
>> Since I'm only interested in detecting drops I don't see how this help me.
>>
>> I know it is hard to detect drops when you don't have the original data to compare with, but it should be possible to get a rough drop estimate by analysing e.g. tcp counters. The easiest way would be to look at a tcp stream and note every gap there is. Gaps = packet loss.
>>
>> If argus could distinguish between Loss and Retransmission, one could more easily see the amount of "SPAN drops" and amount of retransmissions on the wire.
>>
>> (PS. From a discussion some year ago, in a perfect world, the retransmission counter should also be split in two: One counter for tcp retransmissions and one counter for duplicate packets, i.e. when the sniffer get two copies of the exact same packet. The latter case is not really retransmissions but rather a faulty SPAN setup.)
>>
>>
>> Oh well, I know all of this is quite obscure, hard to fix and would inflict too many changes in the already existing *loss fields and in the protocol flags field, so I don't expect argus to handle things as perfectly as I would like.
>>
>> Therefor my original question remain:
>> Is there a commandline tool that show me a rough estimate of drops in the sniffed traffic?
>>
>>
>>
>> 2.
>> I found yet a typo in the manual. It says:
>> psloss percent source pkts retransmitted or dropped.
>> pdloss percent destination pkts retransmitted or dropped.
>> it should be
>> sploss
>> dploss
>>
>>
>> 3.
>> I'm just curious... How do the *loss counters in argus work?
>> If I have a single packet, a TCP SYN, this is registered by ra -Zb as:
>>
>> spkts dpkts sloss dloss state
>> 1 0 1 0 S_
>>
>> Why is sloss=1 when only one packet exist?
>>
>> /Elof
>>
>>
>>
>> On Tue, 25 Oct 2011, Carter Bullard wrote:
>>> You can print % loss for a number of flow types, TCP, RTP, ESP, but if you aggregate all the flow records to try to get a singular loss ratio for the whole "wire", the way aggregation is done, we may not retain loss, if the protocols merged don't all have loss metrics.
>>> This is a hard detection problem, but you should be able to detect large %loss situations with existing tools. If you print loss as a percent:
>>> racluster -r file -m proto -s stime dur sploss dploss - tcp
>>> Do you get anything that looks useful?
>>>
>>> Carter
>>> Carter Bullard, QoSient, LLC
>>> 150 E. 57th Street Suite 12D
>>> New York, New York 10022
>>> +1 212 588-9133 Phone
>>> +1 212 588-9134 Fax
>>>
>>> On Oct 25, 2011, at 10:17 AM, elof2 at sentor.se wrote:
>>>
>>>> Hi Carter and list!
>>>>
>>>> Is there any way to easily detect loss in SPAN-traffic?
>>>>
>>>> If I mirror two 1 Gbps full-duplex ports to a 1 Gbps SPAN port, in theory the switch could try to copy 4 Gbps onto it, resulting in dropped packets.
>>>>
>>>> The sniffer machine receiving the mirrored traffic could be heavily loaded and drop packets.
>>>>
>>>> In protocols such as TCP, these drops are detectable, due to gaps in the sequence counters.
>>>>
>>>> Generating a pcap-file, scp:ing it to a machine running wireshark and then looking at the expert info is such a hassle. I'm looking for a commandline tool that show me when packets are missing (by printing a * for every missed packet) or giving me an estimated ratio of drops per minute.
>>>>
>>>> Is there such a tool?
>>>>
>>>>
>>>>
>>>>
>>>> I'm guessing that argus can't help me, since it doesn't distinguish between loss and retransmissions in the 'flgs' field:
>>>> * - Both Src and Dst loss/retransmission
>>>> s - Src loss/retransmissions
>>>> d - Dst loss/retransmissions
>>>>
>>>> /Elof
>>>>
>>>>
>>>
>>
>
More information about the argus
mailing list