Duplicate packets

Wed Jun 17 11:27:43 EDT 2015

Hi Carter!

Sounds great!
I'll generate and send you an obfuscated pcap tomorrow as it it time to 
leave the office now.

/Elof

On Wed, 17 Jun 2015, Carter Bullard wrote:

> Seems like the feature is in argus-3.0.8.1 ... At least if memory serves and the email is accurate.
> I'm on a plane right now, so will check later today, but should be functional ???
> Do you have any packet captures to test the algorithm ???
>
> Carter
>
>> On Jun 17, 2015, at 10:41 AM, elof2 at sentor.se wrote:
>>
>>
>> Hi Carter (and list).
>>
>> I'm kicking life into this old thread.
>>
>> So, what is the current status of this?
>> Can I do anything to help?
>>
>> /Elof
>>
>>
>>
>>> On Wed, 16 Oct 2013, Carter Bullard wrote:
>>>
>>> Hey /Elof et. al,
>>>
>>> So I've got the simplest approach for TCP duplicate tracking (back
>>> to back non-zero ipid's equal for the same flow and direction,
>>> regardless of time) tucked into argus-3.0.7.5, BUT it needs some
>>> serious testing.  This approach doesn't appear to break anything,
>>> as far as I can tell, but getting any type of packet capture to
>>> test would be very helpful !!!!
>>>
>>> With argus's record strategy, I can add new metrics to existing DSRs,
>>> but with care.  I found that I could track a single dup counter in
>>> each direction without breaking filters etc…  so lets give this a try
>>> for the moment.  The client support for reporting dups will come in
>>> the next step.
>>>
>>> This strategy works for a simple case, BUT, the general case will still
>>> need to wait for argus-3.0.9 development.
>>>
>>> Any testing takers ??????
>>>
>>> Hope all is most excellent !!!!
>>>
>>> Carter
>>>
>>>
>>>
>>>> On Oct 10, 2013, at 4:46 AM, elof2 at sentor.se wrote:
>>>>
>>>>
>>>> Hi Carter.
>>>>
>>>> To sum up this thread...
>>>>
>>>> All I really wanted is for 'ra' not to mark SPAN-dupes as retransmissions, and when doing so instead marking them as duplicates.
>>>> That's all.
>>>>
>>>> You have convinced me that this testing is quite expensive, so I opt-out of the discussion and say that you can leave things as-is.
>>>> I'll continue to work around it like I do today:
>>>> My systems generate warnings for massive retransmission amounts.
>>>> I'll have a look and see that it is not retransmissions but SPAN duplicates.
>>>> I ask my client to fix the SPAN.
>>>> He forward the request to his outsourcing partner.
>>>> The partner don't understand or say it is not possible to fix.
>>>> I disable the high-retransmissions-test completely since this sensor will continue to have double traffic 24/7, day in and day out for years.
>>>>
>>>> Had 'ra' been able to flag dupes as dupes and retransmissions as retransmissions, I could have two tests, and only disable the dupe-test if my client can't fix his SPAN.
>>>>
>>>>
>>>> Thanks for all you attention and thoughts regarding this.
>>>>
>>>> PS: I wrote microsecond, not millisecond. As you say, in a millisecond, lots of stuff can happen. The ideal approach is probably to use the RTT value, if such value can be calculated and if not default to e.g. 1 microsecond or whatever is small enough to only include SPAN dupes or other forms of traffic cloning and not accidentally include retransmissions.
>>>>
>>>> /Elof
>>>>
>>>>
>>>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>>>> Retransmissions have a new IP-id, since they are new packets.
>>>>>> Po != Pr
>>>>>
>>>>> Hmmmmm, well, yes and no.  Many kernels set ipid to zero, except when
>>>>> there are fragments.  So ipid can't be used in a general algorithm.
>>>>> Now, we can use it if its there, but what do you do when its not ????
>>>>>
>>>>> So the problem of generic 5-tuple flow modelers is that, by definition,
>>>>> you only have L3/L4 identifiers to identify network activity.
>>>>> Which means that for the purposes of a flow monitor, the IP header
>>>>> and Transport headers are the only thing in the packet.
>>>>>
>>>>> Argus is/has always been different, because we've identified that you need
>>>>> more information to understand what is really going on.
>>>>>
>>>>> The WAN guys have recognized for a while that many dups are really
>>>>> "flow collisions", where the 5-tuple is the same, but the context of
>>>>> the packets is different.  In some cases, the flows are the same flow,
>>>>> but in some cases, they are different customers, using the same IP
>>>>> addresses, but in different MPLS tunnels.
>>>>>
>>>>> Argus's 5-tuple is not what comes after L2, argus uses the uppermost
>>>>> 5-tuple as the key, as that is the best we can do to find the
>>>>> end-to-end flow descriptor.  Argus does have, however, most of the
>>>>> underlying tunnel identifiers available, the local L2 identifiers,
>>>>> the next level tunnel id's, etc...if you want to add to the 5-tuple
>>>>> key.
>>>>>
>>>>>>
>>>>>>
>>>>>> We seem to have different views on what a dupe is. I have thought of it as an 100% identical packet, same VLAN, same MPLS, same TTL, same IP-id, same L2-header, etc.
>>>>>>
>>>>>> The question is then what argus consider a packet. Is it the whole ethernet frame (as I think it is), or could it be just the part above the L2-header?
>>>>>> I understand that the logic can be expensive with your definition of dupe. :-)
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1. the traffic path goes past the same observation point multiple times
>>>>>>   the *same* packet goes by multiple times.
>>>>>>   This is not the same as monitoring a one-legged router where we see
>>>>>>   an incoming packet, and then see it again after it was routed. The
>>>>>>   routed packet is a "new" packet with an updated TTL, i.e. Po != Pr.
>>>>>
>>>>> Well TTL is different only if a router processed the packet.  For tunneled
>>>>> traffic, or switched traffic, TTL stays the same.
>>>>>
>>>>> So, we'll have to store a lot of data per flow, and update that data on
>>>>> each packet, to be able to make your identical packet test.  Pretty
>>>>> expensive to test something that you shouldn't ever have to test.
>>>>>
>>>>>>
>>>>>>
>>>>>> In my world, scenario #3 is the most common one. Faulty SPAN setup that generate doubled traffic. This is easy to manually spot if *all* of the traffic is duplicated, but sometimes there's a mix, where some networks/vlans are mirrored fine (one copy in each direction) while another are duplicated. In these cases, a spot test can easily miss the bad SPAN configuration. That's my main reason why I want argus to handle dupes.
>>>>>
>>>>> So, lets discuss your situation, where one of the VLAN mirrors is screwed up, and
>>>>> lets imagine that its messed up in only one direction, and its a Tivo DVR, so ipid's
>>>>> are not available, and the mirror device is a switch, so no TTL changes.
>>>>>
>>>>> We need an algorithm that at least describes what is on
>>>>> the wire, so the ra* clients can figure out that there is a bad VLAN
>>>>> mirror.  You don't want argus to make that call (now that would be very
>>>>> complicated).
>>>>>
>>>>> The goal is to not over count flow metrics, to generate data that
>>>>> reflects what is really going on on the wire.  So we need to be able
>>>>> to have a real correction mechanism that doesn't skew the data.
>>>>>
>>>>> Not sure that duplicate packets in the 1 millisecond time frame is good
>>>>> enough.  RTT can be shorter than 1 mSec in small workgroups, and so
>>>>> legitimate retransmissions may trip up something.  RTT is good, in this
>>>>> case, as it gives you a real value for the test.
>>>>>
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>>>
>>>>>> /Elof
>>>>>>
>>>>>>
>>>>>>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>>>>> There is a lot more to this than your response would indicate.
>>>>>>>
>>>>>>> There isn't anything in a packet that distinguishes a
>>>>>>> retransmission from the original.
>>>>>>>
>>>>>>> Same can apply to dups, but
>>>>>>> generally dups are different.  They have different L2 identifiers,
>>>>>>> or they are (or can be) in different VLANs, or they are in
>>>>>>> different MPLS or GRE tunnels, etc...
>>>>>>>
>>>>>>> The content of a packet retransmission is identical in every
>>>>>>> way to the original packet.  As long as the network treats the
>>>>>>> original and the retransmission in the same way (path, priority),
>>>>>>> the Po will be identical to Pr.  Po == Pr where Po is the
>>>>>>> original packet and Pr is the retransmitted packet.
>>>>>>>
>>>>>>> Retransmissions occur only because the original sender decides to
>>>>>>> send a packet again.  For protocols like TCP, this requires a full
>>>>>>> round-trip time to occur before the sender can realize that the
>>>>>>> packet didn't get to the far side.  So the time between the original
>>>>>>> packet and the retransmission must be greater than the round-trip time
>>>>>>> of the network connection.
>>>>>>>
>>>>>>> Dups, however, generally appear due to 3 reasons.
>>>>>>>
>>>>>>> 1. the traffic path goes past the same observation point multiple times
>>>>>>>     the same packet goes by multiple times.
>>>>>>>
>>>>>>> 2. the network duplicates a packet, so for reliability or multicasting..
>>>>>>>     two or more copies of the same packet exist in the network at the same time
>>>>>>>
>>>>>>> 3. the collection infrastructure generates multiple copies of a single packet
>>>>>>>     one packet in the network, but port mirroring generates multiple copies
>>>>>>>
>>>>>>> In some situations, its easy to distinguish the dups, especially in case 1.
>>>>>>> The IP time to live field may have changed if a router is involved, or
>>>>>>> new source and/or destination ethernet addresses are in the header, or the
>>>>>>> packet is on the same wire twice, but in different services, like VLANs
>>>>>>> or tunnels.  Argus can discriminate these types of duplicates, through
>>>>>>> modification of the flow keys (5-TUPLE+L2+VLAN+MPLS).
>>>>>>>
>>>>>>> Well anyway, this is just the start of the description.  It can be much
>>>>>>> more complicated that this.
>>>>>>>
>>>>>>>
>>>>>>> Now with regard to gaps…. Gaps are where argus doesn't see all the packets
>>>>>>> in a flow.  This happens when there is loss in the collection system,
>>>>>>> packet was on the wire, but it didn't get to argus for some reason,
>>>>>>> OR when there is stripping, or load balancing and your argus only sees
>>>>>>> 50%, 33%, or 25% of the packets in a flow.  TCP indicates that there
>>>>>>> were 10000 bytes transferred, but you only observed 5000 bytes.
>>>>>>>
>>>>>>> This is important, and we get it for free, because we're trying to
>>>>>>> figure out the loss rate.
>>>>>>>
>>>>>>> Carter
>>>>>>>
>>>>>>>
>>>>>>>> On Oct 9, 2013, at 10:20 AM, elof2 at sentor.se wrote:
>>>>>>>>
>>>>>>>>> On Tue, 1 Oct 2013, Carter Bullard wrote:
>>>>>>>>>
>>>>>>>>> Well, hmmmmmmm…. Everyone else wants to do de-duping of the packet stream.
>>>>>>>>> Why would you want to be different from everyone else ?????   ;O)
>>>>>>>>
>>>>>>>> I'm curious. What exactly is it that everyone wants (or not wants)?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> The strategy is to differentiate loss, retrans and dups, and report
>>>>>>>>> them as independent metrics, with loss being observable loss, retrans being observable duplicates, and dups (for TCP) being retrans arriving in less than an RTT.
>>>>>>>>
>>>>>>>> I don't fully agree about the dups.
>>>>>>>> A dupe is, in my opinion, an *exact* copy of the original packet.
>>>>>>>> A retransmission is not a dupe, it is a new packet, crafted because the original supposedly got lost.
>>>>>>>>
>>>>>>>> Therefore, the logic need not be so expensive. If the very next packet is identical to the last packet and it was received within a microsecond from the last packet then it is a dupe.
>>>>>>>>
>>>>>>>> Taking the RTT into consideration seem a bit excessive for the simple task of dupe recognition. Also, a RTT is not always possible to calulate if the flow only consist of a single SYN, or is just unidirectional UDP traffic, etc, but the packets of the flow are still duplicated.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> We still will need to derive gaps, which are lost packets that were not
>>>>>>>> retransmitted.
>>>>>>>>
>>>>>>>> Oooh, are you talking about distinguishing between external loss and internal loss?
>>>>>>>> When argus see a gap in tcp sequence numbers you know there has been a drop, but not where it occurred.
>>>>>>>> If argus then see a tcp retransmission for that gap, we know the drop was external, otherwise it was probably internal.
>>>>>>>> That kind of logic seem expensive. If it *is* really expensive, I would say don't do it. Only do the first gap detection (as you already do today) and leave the task of understanding where the drops occurr to the user.
>>>>>>>>
>>>>>>>> /Elof
>>>
>