Duplicate packets

Carter Bullard carter at qosient.com
Thu Oct 10 10:26:08 EDT 2013


I forgot to mention again, that I need packet captures to help
develop good solution.  So if you have packet dumps of the
situation you want argus to detect, could you send some this
way ?????

Carter




On Oct 10, 2013, at 8:31 AM, Carter Bullard <carter at qosient.com> wrote:

> Hey /Elof,
> Well, don't give up so quickly ...  To sum up my side of this thread, in my original response I said the Telstra guy(s) were also interested in a solution to this issue, and I said I would work on it in the next round.
> 
> I think the issue I see is just a little bit more complicated than the one you see, in the general case, but I think we agree on a strategy.  The approach Telstra and I had discussed is the same as yours, do duplicate vs retransmission discrimination using arrival times, but the threshold wasn't going to be hardcoded to 1 mSec or 1 uSec, we were going to use the RTT, actually 1/2 RTT,  and we'll use the TCP sequence number, instead of the ipid, for TCP traffic, and sequence numbers when we have them for UDP traffic such as ESP.  This will allow us to detect dups that are interleaved with other flow packets.  We'll use IpId if its not zero, but it will only allow us to detect dups that are back-to-back.
> 
> Now we'll have to create a new DSR and modify a few existing ones for this, so it may trigger us to go to argus-4.0.0 on the next round.
> 
> Should start testing just after the argus-3.0.8 release.
> 
> So if that is agreeable, keep up the campaign to make it happen !!!
> Carter
> 
> 
> 
>> On Oct 10, 2013, at 4:46 AM, elof2 at sentor.se wrote:
>> 
>> 
>> Hi Carter.
>> 
>> To sum up this thread...
>> 
>> All I really wanted is for 'ra' not to mark SPAN-dupes as retransmissions, and when doing so instead marking them as duplicates.
>> That's all.
>> 
>> You have convinced me that this testing is quite expensive, so I opt-out of the discussion and say that you can leave things as-is.
>> I'll continue to work around it like I do today:
>> My systems generate warnings for massive retransmission amounts.
>> I'll have a look and see that it is not retransmissions but SPAN duplicates.
>> I ask my client to fix the SPAN.
>> He forward the request to his outsourcing partner.
>> The partner don't understand or say it is not possible to fix.
>> I disable the high-retransmissions-test completely since this sensor will continue to have double traffic 24/7, day in and day out for years.
>> 
>> Had 'ra' been able to flag dupes as dupes and retransmissions as retransmissions, I could have two tests, and only disable the dupe-test if my client can't fix his SPAN.
>> 
>> 
>> Thanks for all you attention and thoughts regarding this.
>> 
>> PS: I wrote microsecond, not millisecond. As you say, in a millisecond, lots of stuff can happen. The ideal approach is probably to use the RTT value, if such value can be calculated and if not default to e.g. 1 microsecond or whatever is small enough to only include SPAN dupes or other forms of traffic cloning and not accidentally include retransmissions.
>> 
>> /Elof
>> 
>> 
>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>> Retransmissions have a new IP-id, since they are new packets.
>>>> Po != Pr
>>> 
>>> Hmmmmm, well, yes and no.  Many kernels set ipid to zero, except when
>>> there are fragments.  So ipid can't be used in a general algorithm.
>>> Now, we can use it if its there, but what do you do when its not ????
>>> 
>>> So the problem of generic 5-tuple flow modelers is that, by definition,
>>> you only have L3/L4 identifiers to identify network activity.
>>> Which means that for the purposes of a flow monitor, the IP header
>>> and Transport headers are the only thing in the packet.
>>> 
>>> Argus is/has always been different, because we've identified that you need
>>> more information to understand what is really going on.
>>> 
>>> The WAN guys have recognized for a while that many dups are really
>>> "flow collisions", where the 5-tuple is the same, but the context of
>>> the packets is different.  In some cases, the flows are the same flow,
>>> but in some cases, they are different customers, using the same IP
>>> addresses, but in different MPLS tunnels.
>>> 
>>> Argus's 5-tuple is not what comes after L2, argus uses the uppermost
>>> 5-tuple as the key, as that is the best we can do to find the
>>> end-to-end flow descriptor.  Argus does have, however, most of the
>>> underlying tunnel identifiers available, the local L2 identifiers,
>>> the next level tunnel id's, etc...if you want to add to the 5-tuple
>>> key.
>>> 
>>>> 
>>>> 
>>>> We seem to have different views on what a dupe is. I have thought of it as an 100% identical packet, same VLAN, same MPLS, same TTL, same IP-id, same L2-header, etc.
>>>> 
>>>> The question is then what argus consider a packet. Is it the whole ethernet frame (as I think it is), or could it be just the part above the L2-header?
>>>> I understand that the logic can be expensive with your definition of dupe. :-)
>>>> 
>>>> 
>>>> 
>>>> 1. the traffic path goes past the same observation point multiple times
>>>>   the *same* packet goes by multiple times.
>>>>   This is not the same as monitoring a one-legged router where we see
>>>>   an incoming packet, and then see it again after it was routed. The
>>>>   routed packet is a "new" packet with an updated TTL, i.e. Po != Pr.
>>> 
>>> Well TTL is different only if a router processed the packet.  For tunneled
>>> traffic, or switched traffic, TTL stays the same.
>>> 
>>> So, we'll have to store a lot of data per flow, and update that data on
>>> each packet, to be able to make your identical packet test.  Pretty
>>> expensive to test something that you shouldn't ever have to test.
>>> 
>>>> 
>>>> 
>>>> In my world, scenario #3 is the most common one. Faulty SPAN setup that generate doubled traffic. This is easy to manually spot if *all* of the traffic is duplicated, but sometimes there's a mix, where some networks/vlans are mirrored fine (one copy in each direction) while another are duplicated. In these cases, a spot test can easily miss the bad SPAN configuration. That's my main reason why I want argus to handle dupes.
>>> 
>>> So, lets discuss your situation, where one of the VLAN mirrors is screwed up, and
>>> lets imagine that its messed up in only one direction, and its a Tivo DVR, so ipid's
>>> are not available, and the mirror device is a switch, so no TTL changes.
>>> 
>>> We need an algorithm that at least describes what is on
>>> the wire, so the ra* clients can figure out that there is a bad VLAN
>>> mirror.  You don't want argus to make that call (now that would be very
>>> complicated).
>>> 
>>> The goal is to not over count flow metrics, to generate data that
>>> reflects what is really going on on the wire.  So we need to be able
>>> to have a real correction mechanism that doesn't skew the data.
>>> 
>>> Not sure that duplicate packets in the 1 millisecond time frame is good
>>> enough.  RTT can be shorter than 1 mSec in small workgroups, and so
>>> legitimate retransmissions may trip up something.  RTT is good, in this
>>> case, as it gives you a real value for the test.
>>> 
>>> 
>>> Carter
>>> 
>>> 
>>>> 
>>>> /Elof
>>>> 
>>>> 
>>>>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>>> There is a lot more to this than your response would indicate.
>>>>> 
>>>>> There isn't anything in a packet that distinguishes a
>>>>> retransmission from the original.
>>>>> 
>>>>> Same can apply to dups, but
>>>>> generally dups are different.  They have different L2 identifiers,
>>>>> or they are (or can be) in different VLANs, or they are in
>>>>> different MPLS or GRE tunnels, etc...
>>>>> 
>>>>> The content of a packet retransmission is identical in every
>>>>> way to the original packet.  As long as the network treats the
>>>>> original and the retransmission in the same way (path, priority),
>>>>> the Po will be identical to Pr.  Po == Pr where Po is the
>>>>> original packet and Pr is the retransmitted packet.
>>>>> 
>>>>> Retransmissions occur only because the original sender decides to
>>>>> send a packet again.  For protocols like TCP, this requires a full
>>>>> round-trip time to occur before the sender can realize that the
>>>>> packet didn't get to the far side.  So the time between the original
>>>>> packet and the retransmission must be greater than the round-trip time
>>>>> of the network connection.
>>>>> 
>>>>> Dups, however, generally appear due to 3 reasons.
>>>>> 
>>>>> 1. the traffic path goes past the same observation point multiple times
>>>>>     the same packet goes by multiple times.
>>>>> 
>>>>> 2. the network duplicates a packet, so for reliability or multicasting..
>>>>>     two or more copies of the same packet exist in the network at the same time
>>>>> 
>>>>> 3. the collection infrastructure generates multiple copies of a single packet
>>>>>     one packet in the network, but port mirroring generates multiple copies
>>>>> 
>>>>> In some situations, its easy to distinguish the dups, especially in case 1.
>>>>> The IP time to live field may have changed if a router is involved, or
>>>>> new source and/or destination ethernet addresses are in the header, or the
>>>>> packet is on the same wire twice, but in different services, like VLANs
>>>>> or tunnels.  Argus can discriminate these types of duplicates, through
>>>>> modification of the flow keys (5-TUPLE+L2+VLAN+MPLS).
>>>>> 
>>>>> Well anyway, this is just the start of the description.  It can be much
>>>>> more complicated that this.
>>>>> 
>>>>> 
>>>>> Now with regard to gaps…. Gaps are where argus doesn't see all the packets
>>>>> in a flow.  This happens when there is loss in the collection system,
>>>>> packet was on the wire, but it didn't get to argus for some reason,
>>>>> OR when there is stripping, or load balancing and your argus only sees
>>>>> 50%, 33%, or 25% of the packets in a flow.  TCP indicates that there
>>>>> were 10000 bytes transferred, but you only observed 5000 bytes.
>>>>> 
>>>>> This is important, and we get it for free, because we're trying to
>>>>> figure out the loss rate.
>>>>> 
>>>>> Carter
>>>>> 
>>>>> 
>>>>>> On Oct 9, 2013, at 10:20 AM, elof2 at sentor.se wrote:
>>>>>> 
>>>>>>> On Tue, 1 Oct 2013, Carter Bullard wrote:
>>>>>>> 
>>>>>>> Well, hmmmmmmm…. Everyone else wants to do de-duping of the packet stream.
>>>>>>> Why would you want to be different from everyone else ?????   ;O)
>>>>>> 
>>>>>> I'm curious. What exactly is it that everyone wants (or not wants)?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> The strategy is to differentiate loss, retrans and dups, and report
>>>>>>> them as independent metrics, with loss being observable loss, retrans being observable duplicates, and dups (for TCP) being retrans arriving in less than an RTT.
>>>>>> 
>>>>>> I don't fully agree about the dups.
>>>>>> A dupe is, in my opinion, an *exact* copy of the original packet.
>>>>>> A retransmission is not a dupe, it is a new packet, crafted because the original supposedly got lost.
>>>>>> 
>>>>>> Therefore, the logic need not be so expensive. If the very next packet is identical to the last packet and it was received within a microsecond from the last packet then it is a dupe.
>>>>>> 
>>>>>> Taking the RTT into consideration seem a bit excessive for the simple task of dupe recognition. Also, a RTT is not always possible to calulate if the flow only consist of a single SYN, or is just unidirectional UDP traffic, etc, but the packets of the flow are still duplicated.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> We still will need to derive gaps, which are lost packets that were not
>>>>>> retransmitted.
>>>>>> 
>>>>>> Oooh, are you talking about distinguishing between external loss and internal loss?
>>>>>> When argus see a gap in tcp sequence numbers you know there has been a drop, but not where it occurred.
>>>>>> If argus then see a tcp retransmission for that gap, we know the drop was external, otherwise it was probably internal.
>>>>>> That kind of logic seem expensive. If it *is* really expensive, I would say don't do it. Only do the first gap detection (as you already do today) and leave the task of understanding where the drops occurr to the user.
>>>>>> 
>>>>>> /Elof
>>> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131010/5a8cec98/attachment.bin>


More information about the argus mailing list