Duplicate packets

elof2 at sentor.se elof2 at sentor.se
Wed Jun 17 10:41:53 EDT 2015


Hi Carter (and list).

I'm kicking life into this old thread.

So, what is the current status of this?
Can I do anything to help?

/Elof



On Wed, 16 Oct 2013, Carter Bullard wrote:

> Hey /Elof et. al,
>
> So I've got the simplest approach for TCP duplicate tracking (back
> to back non-zero ipid's equal for the same flow and direction,
> regardless of time) tucked into argus-3.0.7.5, BUT it needs some
> serious testing.  This approach doesn't appear to break anything,
> as far as I can tell, but getting any type of packet capture to
> test would be very helpful !!!!
>
> With argus's record strategy, I can add new metrics to existing DSRs,
> but with care.  I found that I could track a single dup counter in
> each direction without breaking filters etc…  so lets give this a try
> for the moment.  The client support for reporting dups will come in
> the next step.
>
> This strategy works for a simple case, BUT, the general case will still
> need to wait for argus-3.0.9 development.
>
> Any testing takers ??????
>
> Hope all is most excellent !!!!
>
> Carter
>
>
>
> On Oct 10, 2013, at 4:46 AM, elof2 at sentor.se wrote:
>
>>
>> Hi Carter.
>>
>> To sum up this thread...
>>
>> All I really wanted is for 'ra' not to mark SPAN-dupes as retransmissions, and when doing so instead marking them as duplicates.
>> That's all.
>>
>> You have convinced me that this testing is quite expensive, so I opt-out of the discussion and say that you can leave things as-is.
>> I'll continue to work around it like I do today:
>> My systems generate warnings for massive retransmission amounts.
>> I'll have a look and see that it is not retransmissions but SPAN duplicates.
>> I ask my client to fix the SPAN.
>> He forward the request to his outsourcing partner.
>> The partner don't understand or say it is not possible to fix.
>> I disable the high-retransmissions-test completely since this sensor will continue to have double traffic 24/7, day in and day out for years.
>>
>> Had 'ra' been able to flag dupes as dupes and retransmissions as retransmissions, I could have two tests, and only disable the dupe-test if my client can't fix his SPAN.
>>
>>
>> Thanks for all you attention and thoughts regarding this.
>>
>> PS: I wrote microsecond, not millisecond. As you say, in a millisecond, lots of stuff can happen. The ideal approach is probably to use the RTT value, if such value can be calculated and if not default to e.g. 1 microsecond or whatever is small enough to only include SPAN dupes or other forms of traffic cloning and not accidentally include retransmissions.
>>
>> /Elof
>>
>>
>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>> Retransmissions have a new IP-id, since they are new packets.
>>>> Po != Pr
>>>
>>> Hmmmmm, well, yes and no.  Many kernels set ipid to zero, except when
>>> there are fragments.  So ipid can't be used in a general algorithm.
>>> Now, we can use it if its there, but what do you do when its not ????
>>>
>>> So the problem of generic 5-tuple flow modelers is that, by definition,
>>> you only have L3/L4 identifiers to identify network activity.
>>> Which means that for the purposes of a flow monitor, the IP header
>>> and Transport headers are the only thing in the packet.
>>>
>>> Argus is/has always been different, because we've identified that you need
>>> more information to understand what is really going on.
>>>
>>> The WAN guys have recognized for a while that many dups are really
>>> "flow collisions", where the 5-tuple is the same, but the context of
>>> the packets is different.  In some cases, the flows are the same flow,
>>> but in some cases, they are different customers, using the same IP
>>> addresses, but in different MPLS tunnels.
>>>
>>> Argus's 5-tuple is not what comes after L2, argus uses the uppermost
>>> 5-tuple as the key, as that is the best we can do to find the
>>> end-to-end flow descriptor.  Argus does have, however, most of the
>>> underlying tunnel identifiers available, the local L2 identifiers,
>>> the next level tunnel id's, etc...if you want to add to the 5-tuple
>>> key.
>>>
>>>>
>>>>
>>>> We seem to have different views on what a dupe is. I have thought of it as an 100% identical packet, same VLAN, same MPLS, same TTL, same IP-id, same L2-header, etc.
>>>>
>>>> The question is then what argus consider a packet. Is it the whole ethernet frame (as I think it is), or could it be just the part above the L2-header?
>>>> I understand that the logic can be expensive with your definition of dupe. :-)
>>>>
>>>>
>>>>
>>>> 1. the traffic path goes past the same observation point multiple times
>>>>    the *same* packet goes by multiple times.
>>>>    This is not the same as monitoring a one-legged router where we see
>>>>    an incoming packet, and then see it again after it was routed. The
>>>>    routed packet is a "new" packet with an updated TTL, i.e. Po != Pr.
>>>
>>> Well TTL is different only if a router processed the packet.  For tunneled
>>> traffic, or switched traffic, TTL stays the same.
>>>
>>> So, we'll have to store a lot of data per flow, and update that data on
>>> each packet, to be able to make your identical packet test.  Pretty
>>> expensive to test something that you shouldn't ever have to test.
>>>
>>>>
>>>>
>>>> In my world, scenario #3 is the most common one. Faulty SPAN setup that generate doubled traffic. This is easy to manually spot if *all* of the traffic is duplicated, but sometimes there's a mix, where some networks/vlans are mirrored fine (one copy in each direction) while another are duplicated. In these cases, a spot test can easily miss the bad SPAN configuration. That's my main reason why I want argus to handle dupes.
>>>
>>> So, lets discuss your situation, where one of the VLAN mirrors is screwed up, and
>>> lets imagine that its messed up in only one direction, and its a Tivo DVR, so ipid's
>>> are not available, and the mirror device is a switch, so no TTL changes.
>>>
>>> We need an algorithm that at least describes what is on
>>> the wire, so the ra* clients can figure out that there is a bad VLAN
>>> mirror.  You don't want argus to make that call (now that would be very
>>> complicated).
>>>
>>> The goal is to not over count flow metrics, to generate data that
>>> reflects what is really going on on the wire.  So we need to be able
>>> to have a real correction mechanism that doesn't skew the data.
>>>
>>> Not sure that duplicate packets in the 1 millisecond time frame is good
>>> enough.  RTT can be shorter than 1 mSec in small workgroups, and so
>>> legitimate retransmissions may trip up something.  RTT is good, in this
>>> case, as it gives you a real value for the test.
>>>
>>>
>>> Carter
>>>
>>>
>>>>
>>>> /Elof
>>>>
>>>>
>>>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>>> There is a lot more to this than your response would indicate.
>>>>>
>>>>> There isn't anything in a packet that distinguishes a
>>>>> retransmission from the original.
>>>>>
>>>>> Same can apply to dups, but
>>>>> generally dups are different.  They have different L2 identifiers,
>>>>> or they are (or can be) in different VLANs, or they are in
>>>>> different MPLS or GRE tunnels, etc...
>>>>>
>>>>> The content of a packet retransmission is identical in every
>>>>> way to the original packet.  As long as the network treats the
>>>>> original and the retransmission in the same way (path, priority),
>>>>> the Po will be identical to Pr.  Po == Pr where Po is the
>>>>> original packet and Pr is the retransmitted packet.
>>>>>
>>>>> Retransmissions occur only because the original sender decides to
>>>>> send a packet again.  For protocols like TCP, this requires a full
>>>>> round-trip time to occur before the sender can realize that the
>>>>> packet didn't get to the far side.  So the time between the original
>>>>> packet and the retransmission must be greater than the round-trip time
>>>>> of the network connection.
>>>>>
>>>>> Dups, however, generally appear due to 3 reasons.
>>>>>
>>>>> 1. the traffic path goes past the same observation point multiple times
>>>>>      the same packet goes by multiple times.
>>>>>
>>>>> 2. the network duplicates a packet, so for reliability or multicasting..
>>>>>      two or more copies of the same packet exist in the network at the same time
>>>>>
>>>>> 3. the collection infrastructure generates multiple copies of a single packet
>>>>>      one packet in the network, but port mirroring generates multiple copies
>>>>>
>>>>> In some situations, its easy to distinguish the dups, especially in case 1.
>>>>> The IP time to live field may have changed if a router is involved, or
>>>>> new source and/or destination ethernet addresses are in the header, or the
>>>>> packet is on the same wire twice, but in different services, like VLANs
>>>>> or tunnels.  Argus can discriminate these types of duplicates, through
>>>>> modification of the flow keys (5-TUPLE+L2+VLAN+MPLS).
>>>>>
>>>>> Well anyway, this is just the start of the description.  It can be much
>>>>> more complicated that this.
>>>>>
>>>>>
>>>>> Now with regard to gaps…. Gaps are where argus doesn't see all the packets
>>>>> in a flow.  This happens when there is loss in the collection system,
>>>>> packet was on the wire, but it didn't get to argus for some reason,
>>>>> OR when there is stripping, or load balancing and your argus only sees
>>>>> 50%, 33%, or 25% of the packets in a flow.  TCP indicates that there
>>>>> were 10000 bytes transferred, but you only observed 5000 bytes.
>>>>>
>>>>> This is important, and we get it for free, because we're trying to
>>>>> figure out the loss rate.
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Oct 9, 2013, at 10:20 AM, elof2 at sentor.se wrote:
>>>>>
>>>>>> On Tue, 1 Oct 2013, Carter Bullard wrote:
>>>>>>
>>>>>>> Well, hmmmmmmm…. Everyone else wants to do de-duping of the packet stream.
>>>>>>> Why would you want to be different from everyone else ?????   ;O)
>>>>>>
>>>>>> I'm curious. What exactly is it that everyone wants (or not wants)?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> The strategy is to differentiate loss, retrans and dups, and report
>>>>>>> them as independent metrics, with loss being observable loss, retrans being observable duplicates, and dups (for TCP) being retrans arriving in less than an RTT.
>>>>>>
>>>>>> I don't fully agree about the dups.
>>>>>> A dupe is, in my opinion, an *exact* copy of the original packet.
>>>>>> A retransmission is not a dupe, it is a new packet, crafted because the original supposedly got lost.
>>>>>>
>>>>>> Therefore, the logic need not be so expensive. If the very next packet is identical to the last packet and it was received within a microsecond from the last packet then it is a dupe.
>>>>>>
>>>>>> Taking the RTT into consideration seem a bit excessive for the simple task of dupe recognition. Also, a RTT is not always possible to calulate if the flow only consist of a single SYN, or is just unidirectional UDP traffic, etc, but the packets of the flow are still duplicated.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> We still will need to derive gaps, which are lost packets that were not
>>>>>> retransmitted.
>>>>>>
>>>>>> Oooh, are you talking about distinguishing between external loss and internal loss?
>>>>>> When argus see a gap in tcp sequence numbers you know there has been a drop, but not where it occurred.
>>>>>> If argus then see a tcp retransmission for that gap, we know the drop was external, otherwise it was probably internal.
>>>>>> That kind of logic seem expensive. If it *is* really expensive, I would say don't do it. Only do the first gap detection (as you already do today) and leave the task of understanding where the drops occurr to the user.
>>>>>>
>>>>>> /Elof
>>>>>>
>>>>>>
>>>>>
>>>
>
>


More information about the argus mailing list