Duplicate packets

Thu Oct 10 04:46:40 EDT 2013

Hi Carter.

To sum up this thread...

All I really wanted is for 'ra' not to mark SPAN-dupes as 
retransmissions, and when doing so instead marking them as duplicates.
That's all.

You have convinced me that this testing is quite expensive, so I opt-out 
of the discussion and say that you can leave things as-is.
I'll continue to work around it like I do today:
My systems generate warnings for massive retransmission amounts.
I'll have a look and see that it is not retransmissions but SPAN 
duplicates.
I ask my client to fix the SPAN.
He forward the request to his outsourcing partner.
The partner don't understand or say it is not possible to fix.
I disable the high-retransmissions-test completely since this sensor will 
continue to have double traffic 24/7, day in and day out for years.

Had 'ra' been able to flag dupes as dupes and retransmissions as 
retransmissions, I could have two tests, and only disable the dupe-test if 
my client can't fix his SPAN.

Thanks for all you attention and thoughts regarding this.

PS: I wrote microsecond, not millisecond. As you say, in a millisecond, 
lots of stuff can happen. The ideal approach is probably to use the RTT 
value, if such value can be calculated and if not default to e.g. 1 
microsecond or whatever is small enough to only include SPAN dupes or 
other forms of traffic cloning and not accidentally include retransmissions.

/Elof

On Wed, 9 Oct 2013, Carter Bullard wrote:
>> Retransmissions have a new IP-id, since they are new packets.
>> Po != Pr
>
> Hmmmmm, well, yes and no.  Many kernels set ipid to zero, except when
> there are fragments.  So ipid can't be used in a general algorithm.
> Now, we can use it if its there, but what do you do when its not ????
>
> So the problem of generic 5-tuple flow modelers is that, by definition,
> you only have L3/L4 identifiers to identify network activity.
> Which means that for the purposes of a flow monitor, the IP header
> and Transport headers are the only thing in the packet.
>
> Argus is/has always been different, because we've identified that you need
> more information to understand what is really going on.
>
> The WAN guys have recognized for a while that many dups are really
> "flow collisions", where the 5-tuple is the same, but the context of
> the packets is different.  In some cases, the flows are the same flow,
> but in some cases, they are different customers, using the same IP
> addresses, but in different MPLS tunnels.
>
> Argus's 5-tuple is not what comes after L2, argus uses the uppermost
> 5-tuple as the key, as that is the best we can do to find the
> end-to-end flow descriptor.  Argus does have, however, most of the
> underlying tunnel identifiers available, the local L2 identifiers,
> the next level tunnel id's, etc...if you want to add to the 5-tuple
> key.
>
>>
>>
>> We seem to have different views on what a dupe is. I have thought of it as an 100% identical packet, same VLAN, same MPLS, same TTL, same IP-id, same L2-header, etc.
>>
>> The question is then what argus consider a packet. Is it the whole ethernet frame (as I think it is), or could it be just the part above the L2-header?
>> I understand that the logic can be expensive with your definition of dupe. :-)
>>
>>
>>
>> 1. the traffic path goes past the same observation point multiple times
>>     the *same* packet goes by multiple times.
>>     This is not the same as monitoring a one-legged router where we see
>>     an incoming packet, and then see it again after it was routed. The
>>     routed packet is a "new" packet with an updated TTL, i.e. Po != Pr.
>
> Well TTL is different only if a router processed the packet.  For tunneled
> traffic, or switched traffic, TTL stays the same.
>
> So, we'll have to store a lot of data per flow, and update that data on
> each packet, to be able to make your identical packet test.  Pretty
> expensive to test something that you shouldn't ever have to test.
>
>>
>>
>> In my world, scenario #3 is the most common one. Faulty SPAN setup that generate doubled traffic. This is easy to manually spot if *all* of the traffic is duplicated, but sometimes there's a mix, where some networks/vlans are mirrored fine (one copy in each direction) while another are duplicated. In these cases, a spot test can easily miss the bad SPAN configuration. That's my main reason why I want argus to handle dupes.
>
> So, lets discuss your situation, where one of the VLAN mirrors is screwed up, and
> lets imagine that its messed up in only one direction, and its a Tivo DVR, so ipid's
> are not available, and the mirror device is a switch, so no TTL changes.
>
> We need an algorithm that at least describes what is on
> the wire, so the ra* clients can figure out that there is a bad VLAN
> mirror.  You don't want argus to make that call (now that would be very
> complicated).
>
> The goal is to not over count flow metrics, to generate data that
> reflects what is really going on on the wire.  So we need to be able
> to have a real correction mechanism that doesn't skew the data.
>
> Not sure that duplicate packets in the 1 millisecond time frame is good
> enough.  RTT can be shorter than 1 mSec in small workgroups, and so
> legitimate retransmissions may trip up something.  RTT is good, in this
> case, as it gives you a real value for the test.
>
>
> Carter
>
>
>>
>> /Elof
>>
>>
>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>> There is a lot more to this than your response would indicate.
>>>
>>> There isn't anything in a packet that distinguishes a
>>> retransmission from the original.
>>>
>>> Same can apply to dups, but
>>> generally dups are different.  They have different L2 identifiers,
>>> or they are (or can be) in different VLANs, or they are in
>>> different MPLS or GRE tunnels, etc...
>>>
>>> The content of a packet retransmission is identical in every
>>> way to the original packet.  As long as the network treats the
>>> original and the retransmission in the same way (path, priority),
>>> the Po will be identical to Pr.  Po == Pr where Po is the
>>> original packet and Pr is the retransmitted packet.
>>>
>>> Retransmissions occur only because the original sender decides to
>>> send a packet again.  For protocols like TCP, this requires a full
>>> round-trip time to occur before the sender can realize that the
>>> packet didn't get to the far side.  So the time between the original
>>> packet and the retransmission must be greater than the round-trip time
>>> of the network connection.
>>>
>>> Dups, however, generally appear due to 3 reasons.
>>>
>>>  1. the traffic path goes past the same observation point multiple times
>>>       the same packet goes by multiple times.
>>>
>>>  2. the network duplicates a packet, so for reliability or multicasting..
>>>       two or more copies of the same packet exist in the network at the same time
>>>
>>>  3. the collection infrastructure generates multiple copies of a single packet
>>>       one packet in the network, but port mirroring generates multiple copies
>>>
>>> In some situations, its easy to distinguish the dups, especially in case 1.
>>> The IP time to live field may have changed if a router is involved, or
>>> new source and/or destination ethernet addresses are in the header, or the
>>> packet is on the same wire twice, but in different services, like VLANs
>>> or tunnels.  Argus can discriminate these types of duplicates, through
>>> modification of the flow keys (5-TUPLE+L2+VLAN+MPLS).
>>>
>>> Well anyway, this is just the start of the description.  It can be much
>>> more complicated that this.
>>>
>>>
>>> Now with regard to gaps…. Gaps are where argus doesn't see all the packets
>>> in a flow.  This happens when there is loss in the collection system,
>>> packet was on the wire, but it didn't get to argus for some reason,
>>> OR when there is stripping, or load balancing and your argus only sees
>>> 50%, 33%, or 25% of the packets in a flow.  TCP indicates that there
>>> were 10000 bytes transferred, but you only observed 5000 bytes.
>>>
>>> This is important, and we get it for free, because we're trying to
>>> figure out the loss rate.
>>>
>>> Carter
>>>
>>>
>>> On Oct 9, 2013, at 10:20 AM, elof2 at sentor.se wrote:
>>>
>>>> On Tue, 1 Oct 2013, Carter Bullard wrote:
>>>>
>>>>> Well, hmmmmmmm…. Everyone else wants to do de-duping of the packet stream.
>>>>> Why would you want to be different from everyone else ?????   ;O)
>>>>
>>>> I'm curious. What exactly is it that everyone wants (or not wants)?
>>>>
>>>>
>>>>
>>>>> The strategy is to differentiate loss, retrans and dups, and report
>>>>> them as independent metrics, with loss being observable loss, retrans being observable duplicates, and dups (for TCP) being retrans arriving in less than an RTT.
>>>>
>>>> I don't fully agree about the dups.
>>>> A dupe is, in my opinion, an *exact* copy of the original packet.
>>>> A retransmission is not a dupe, it is a new packet, crafted because the original supposedly got lost.
>>>>
>>>> Therefore, the logic need not be so expensive. If the very next packet is identical to the last packet and it was received within a microsecond from the last packet then it is a dupe.
>>>>
>>>> Taking the RTT into consideration seem a bit excessive for the simple task of dupe recognition. Also, a RTT is not always possible to calulate if the flow only consist of a single SYN, or is just unidirectional UDP traffic, etc, but the packets of the flow are still duplicated.
>>>>
>>>>
>>>>
>>>>
>>>>> We still will need to derive gaps, which are lost packets that were not
>>>> retransmitted.
>>>>
>>>> Oooh, are you talking about distinguishing between external loss and internal loss?
>>>> When argus see a gap in tcp sequence numbers you know there has been a drop, but not where it occurred.
>>>> If argus then see a tcp retransmission for that gap, we know the drop was external, otherwise it was probably internal.
>>>> That kind of logic seem expensive. If it *is* really expensive, I would say don't do it. Only do the first gap detection (as you already do today) and leave the task of understanding where the drops occurr to the user.
>>>>
>>>> /Elof
>>>>
>>>>
>>>
>
>