Duplicate packets
Carter Bullard
carter at qosient.com
Wed Oct 16 12:17:27 EDT 2013
Hey /Elof et. al,
So I've got the simplest approach for TCP duplicate tracking (back
to back non-zero ipid's equal for the same flow and direction,
regardless of time) tucked into argus-3.0.7.5, BUT it needs some
serious testing. This approach doesn't appear to break anything,
as far as I can tell, but getting any type of packet capture to
test would be very helpful !!!!
With argus's record strategy, I can add new metrics to existing DSRs,
but with care. I found that I could track a single dup counter in
each direction without breaking filters etc… so lets give this a try
for the moment. The client support for reporting dups will come in
the next step.
This strategy works for a simple case, BUT, the general case will still
need to wait for argus-3.0.9 development.
Any testing takers ??????
Hope all is most excellent !!!!
Carter
On Oct 10, 2013, at 4:46 AM, elof2 at sentor.se wrote:
>
> Hi Carter.
>
> To sum up this thread...
>
> All I really wanted is for 'ra' not to mark SPAN-dupes as retransmissions, and when doing so instead marking them as duplicates.
> That's all.
>
> You have convinced me that this testing is quite expensive, so I opt-out of the discussion and say that you can leave things as-is.
> I'll continue to work around it like I do today:
> My systems generate warnings for massive retransmission amounts.
> I'll have a look and see that it is not retransmissions but SPAN duplicates.
> I ask my client to fix the SPAN.
> He forward the request to his outsourcing partner.
> The partner don't understand or say it is not possible to fix.
> I disable the high-retransmissions-test completely since this sensor will continue to have double traffic 24/7, day in and day out for years.
>
> Had 'ra' been able to flag dupes as dupes and retransmissions as retransmissions, I could have two tests, and only disable the dupe-test if my client can't fix his SPAN.
>
>
> Thanks for all you attention and thoughts regarding this.
>
> PS: I wrote microsecond, not millisecond. As you say, in a millisecond, lots of stuff can happen. The ideal approach is probably to use the RTT value, if such value can be calculated and if not default to e.g. 1 microsecond or whatever is small enough to only include SPAN dupes or other forms of traffic cloning and not accidentally include retransmissions.
>
> /Elof
>
>
> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>> Retransmissions have a new IP-id, since they are new packets.
>>> Po != Pr
>>
>> Hmmmmm, well, yes and no. Many kernels set ipid to zero, except when
>> there are fragments. So ipid can't be used in a general algorithm.
>> Now, we can use it if its there, but what do you do when its not ????
>>
>> So the problem of generic 5-tuple flow modelers is that, by definition,
>> you only have L3/L4 identifiers to identify network activity.
>> Which means that for the purposes of a flow monitor, the IP header
>> and Transport headers are the only thing in the packet.
>>
>> Argus is/has always been different, because we've identified that you need
>> more information to understand what is really going on.
>>
>> The WAN guys have recognized for a while that many dups are really
>> "flow collisions", where the 5-tuple is the same, but the context of
>> the packets is different. In some cases, the flows are the same flow,
>> but in some cases, they are different customers, using the same IP
>> addresses, but in different MPLS tunnels.
>>
>> Argus's 5-tuple is not what comes after L2, argus uses the uppermost
>> 5-tuple as the key, as that is the best we can do to find the
>> end-to-end flow descriptor. Argus does have, however, most of the
>> underlying tunnel identifiers available, the local L2 identifiers,
>> the next level tunnel id's, etc...if you want to add to the 5-tuple
>> key.
>>
>>>
>>>
>>> We seem to have different views on what a dupe is. I have thought of it as an 100% identical packet, same VLAN, same MPLS, same TTL, same IP-id, same L2-header, etc.
>>>
>>> The question is then what argus consider a packet. Is it the whole ethernet frame (as I think it is), or could it be just the part above the L2-header?
>>> I understand that the logic can be expensive with your definition of dupe. :-)
>>>
>>>
>>>
>>> 1. the traffic path goes past the same observation point multiple times
>>> the *same* packet goes by multiple times.
>>> This is not the same as monitoring a one-legged router where we see
>>> an incoming packet, and then see it again after it was routed. The
>>> routed packet is a "new" packet with an updated TTL, i.e. Po != Pr.
>>
>> Well TTL is different only if a router processed the packet. For tunneled
>> traffic, or switched traffic, TTL stays the same.
>>
>> So, we'll have to store a lot of data per flow, and update that data on
>> each packet, to be able to make your identical packet test. Pretty
>> expensive to test something that you shouldn't ever have to test.
>>
>>>
>>>
>>> In my world, scenario #3 is the most common one. Faulty SPAN setup that generate doubled traffic. This is easy to manually spot if *all* of the traffic is duplicated, but sometimes there's a mix, where some networks/vlans are mirrored fine (one copy in each direction) while another are duplicated. In these cases, a spot test can easily miss the bad SPAN configuration. That's my main reason why I want argus to handle dupes.
>>
>> So, lets discuss your situation, where one of the VLAN mirrors is screwed up, and
>> lets imagine that its messed up in only one direction, and its a Tivo DVR, so ipid's
>> are not available, and the mirror device is a switch, so no TTL changes.
>>
>> We need an algorithm that at least describes what is on
>> the wire, so the ra* clients can figure out that there is a bad VLAN
>> mirror. You don't want argus to make that call (now that would be very
>> complicated).
>>
>> The goal is to not over count flow metrics, to generate data that
>> reflects what is really going on on the wire. So we need to be able
>> to have a real correction mechanism that doesn't skew the data.
>>
>> Not sure that duplicate packets in the 1 millisecond time frame is good
>> enough. RTT can be shorter than 1 mSec in small workgroups, and so
>> legitimate retransmissions may trip up something. RTT is good, in this
>> case, as it gives you a real value for the test.
>>
>>
>> Carter
>>
>>
>>>
>>> /Elof
>>>
>>>
>>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>> There is a lot more to this than your response would indicate.
>>>>
>>>> There isn't anything in a packet that distinguishes a
>>>> retransmission from the original.
>>>>
>>>> Same can apply to dups, but
>>>> generally dups are different. They have different L2 identifiers,
>>>> or they are (or can be) in different VLANs, or they are in
>>>> different MPLS or GRE tunnels, etc...
>>>>
>>>> The content of a packet retransmission is identical in every
>>>> way to the original packet. As long as the network treats the
>>>> original and the retransmission in the same way (path, priority),
>>>> the Po will be identical to Pr. Po == Pr where Po is the
>>>> original packet and Pr is the retransmitted packet.
>>>>
>>>> Retransmissions occur only because the original sender decides to
>>>> send a packet again. For protocols like TCP, this requires a full
>>>> round-trip time to occur before the sender can realize that the
>>>> packet didn't get to the far side. So the time between the original
>>>> packet and the retransmission must be greater than the round-trip time
>>>> of the network connection.
>>>>
>>>> Dups, however, generally appear due to 3 reasons.
>>>>
>>>> 1. the traffic path goes past the same observation point multiple times
>>>> the same packet goes by multiple times.
>>>>
>>>> 2. the network duplicates a packet, so for reliability or multicasting..
>>>> two or more copies of the same packet exist in the network at the same time
>>>>
>>>> 3. the collection infrastructure generates multiple copies of a single packet
>>>> one packet in the network, but port mirroring generates multiple copies
>>>>
>>>> In some situations, its easy to distinguish the dups, especially in case 1.
>>>> The IP time to live field may have changed if a router is involved, or
>>>> new source and/or destination ethernet addresses are in the header, or the
>>>> packet is on the same wire twice, but in different services, like VLANs
>>>> or tunnels. Argus can discriminate these types of duplicates, through
>>>> modification of the flow keys (5-TUPLE+L2+VLAN+MPLS).
>>>>
>>>> Well anyway, this is just the start of the description. It can be much
>>>> more complicated that this.
>>>>
>>>>
>>>> Now with regard to gaps…. Gaps are where argus doesn't see all the packets
>>>> in a flow. This happens when there is loss in the collection system,
>>>> packet was on the wire, but it didn't get to argus for some reason,
>>>> OR when there is stripping, or load balancing and your argus only sees
>>>> 50%, 33%, or 25% of the packets in a flow. TCP indicates that there
>>>> were 10000 bytes transferred, but you only observed 5000 bytes.
>>>>
>>>> This is important, and we get it for free, because we're trying to
>>>> figure out the loss rate.
>>>>
>>>> Carter
>>>>
>>>>
>>>> On Oct 9, 2013, at 10:20 AM, elof2 at sentor.se wrote:
>>>>
>>>>> On Tue, 1 Oct 2013, Carter Bullard wrote:
>>>>>
>>>>>> Well, hmmmmmmm…. Everyone else wants to do de-duping of the packet stream.
>>>>>> Why would you want to be different from everyone else ????? ;O)
>>>>>
>>>>> I'm curious. What exactly is it that everyone wants (or not wants)?
>>>>>
>>>>>
>>>>>
>>>>>> The strategy is to differentiate loss, retrans and dups, and report
>>>>>> them as independent metrics, with loss being observable loss, retrans being observable duplicates, and dups (for TCP) being retrans arriving in less than an RTT.
>>>>>
>>>>> I don't fully agree about the dups.
>>>>> A dupe is, in my opinion, an *exact* copy of the original packet.
>>>>> A retransmission is not a dupe, it is a new packet, crafted because the original supposedly got lost.
>>>>>
>>>>> Therefore, the logic need not be so expensive. If the very next packet is identical to the last packet and it was received within a microsecond from the last packet then it is a dupe.
>>>>>
>>>>> Taking the RTT into consideration seem a bit excessive for the simple task of dupe recognition. Also, a RTT is not always possible to calulate if the flow only consist of a single SYN, or is just unidirectional UDP traffic, etc, but the packets of the flow are still duplicated.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> We still will need to derive gaps, which are lost packets that were not
>>>>> retransmitted.
>>>>>
>>>>> Oooh, are you talking about distinguishing between external loss and internal loss?
>>>>> When argus see a gap in tcp sequence numbers you know there has been a drop, but not where it occurred.
>>>>> If argus then see a tcp retransmission for that gap, we know the drop was external, otherwise it was probably internal.
>>>>> That kind of logic seem expensive. If it *is* really expensive, I would say don't do it. Only do the first gap detection (as you already do today) and leave the task of understanding where the drops occurr to the user.
>>>>>
>>>>> /Elof
>>>>>
>>>>>
>>>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131016/24c9e54f/attachment.bin>
More information about the argus
mailing list