Duplicate packets

Wed Oct 16 12:17:27 EDT 2013

Hey /Elof et. al,

So I've got the simplest approach for TCP duplicate tracking (back
to back non-zero ipid's equal for the same flow and direction,
regardless of time) tucked into argus-3.0.7.5, BUT it needs some
serious testing.  This approach doesn't appear to break anything,
as far as I can tell, but getting any type of packet capture to
test would be very helpful !!!!   

With argus's record strategy, I can add new metrics to existing DSRs,
but with care.  I found that I could track a single dup counter in
each direction without breaking filters etc…  so lets give this a try
for the moment.  The client support for reporting dups will come in
the next step.

This strategy works for a simple case, BUT, the general case will still
need to wait for argus-3.0.9 development.

Any testing takers ??????

Hope all is most excellent !!!!

Carter

On Oct 10, 2013, at 4:46 AM, elof2 at sentor.se wrote:

> 
> Hi Carter.
> 
> To sum up this thread...
> 
> All I really wanted is for 'ra' not to mark SPAN-dupes as retransmissions, and when doing so instead marking them as duplicates.
> That's all.
> 
> You have convinced me that this testing is quite expensive, so I opt-out of the discussion and say that you can leave things as-is.
> I'll continue to work around it like I do today:
> My systems generate warnings for massive retransmission amounts.
> I'll have a look and see that it is not retransmissions but SPAN duplicates.
> I ask my client to fix the SPAN.
> He forward the request to his outsourcing partner.
> The partner don't understand or say it is not possible to fix.
> I disable the high-retransmissions-test completely since this sensor will continue to have double traffic 24/7, day in and day out for years.
> 
> Had 'ra' been able to flag dupes as dupes and retransmissions as retransmissions, I could have two tests, and only disable the dupe-test if my client can't fix his SPAN.
> 
> 
> Thanks for all you attention and thoughts regarding this.
> 
> PS: I wrote microsecond, not millisecond. As you say, in a millisecond, lots of stuff can happen. The ideal approach is probably to use the RTT value, if such value can be calculated and if not default to e.g. 1 microsecond or whatever is small enough to only include SPAN dupes or other forms of traffic cloning and not accidentally include retransmissions.
> 
> /Elof
> 
> 
> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>> Retransmissions have a new IP-id, since they are new packets.
>>> Po != Pr
>> 
>> Hmmmmm, well, yes and no.  Many kernels set ipid to zero, except when
>> there are fragments.  So ipid can't be used in a general algorithm.
>> Now, we can use it if its there, but what do you do when its not ????
>> 
>> So the problem of generic 5-tuple flow modelers is that, by definition,
>> you only have L3/L4 identifiers to identify network activity.
>> Which means that for the purposes of a flow monitor, the IP header
>> and Transport headers are the only thing in the packet.
>> 
>> Argus is/has always been different, because we've identified that you need
>> more information to understand what is really going on.
>> 
>> The WAN guys have recognized for a while that many dups are really
>> "flow collisions", where the 5-tuple is the same, but the context of
>> the packets is different.  In some cases, the flows are the same flow,
>> but in some cases, they are different customers, using the same IP
>> addresses, but in different MPLS tunnels.
>> 
>> Argus's 5-tuple is not what comes after L2, argus uses the uppermost
>> 5-tuple as the key, as that is the best we can do to find the
>> end-to-end flow descriptor.  Argus does have, however, most of the
>> underlying tunnel identifiers available, the local L2 identifiers,
>> the next level tunnel id's, etc...if you want to add to the 5-tuple
>> key.
>> 
>>> 
>>> 
>>> We seem to have different views on what a dupe is. I have thought of it as an 100% identical packet, same VLAN, same MPLS, same TTL, same IP-id, same L2-header, etc.
>>> 
>>> The question is then what argus consider a packet. Is it the whole ethernet frame (as I think it is), or could it be just the part above the L2-header?
>>> I understand that the logic can be expensive with your definition of dupe. :-)
>>> 
>>> 
>>> 
>>> 1. the traffic path goes past the same observation point multiple times
>>>    the *same* packet goes by multiple times.
>>>    This is not the same as monitoring a one-legged router where we see
>>>    an incoming packet, and then see it again after it was routed. The
>>>    routed packet is a "new" packet with an updated TTL, i.e. Po != Pr.
>> 
>> Well TTL is different only if a router processed the packet.  For tunneled
>> traffic, or switched traffic, TTL stays the same.
>> 
>> So, we'll have to store a lot of data per flow, and update that data on
>> each packet, to be able to make your identical packet test.  Pretty
>> expensive to test something that you shouldn't ever have to test.
>> 
>>> 
>>> 
>>> In my world, scenario #3 is the most common one. Faulty SPAN setup that generate doubled traffic. This is easy to manually spot if *all* of the traffic is duplicated, but sometimes there's a mix, where some networks/vlans are mirrored fine (one copy in each direction) while another are duplicated. In these cases, a spot test can easily miss the bad SPAN configuration. That's my main reason why I want argus to handle dupes.
>> 
>> So, lets discuss your situation, where one of the VLAN mirrors is screwed up, and
>> lets imagine that its messed up in only one direction, and its a Tivo DVR, so ipid's
>> are not available, and the mirror device is a switch, so no TTL changes.
>> 
>> We need an algorithm that at least describes what is on
>> the wire, so the ra* clients can figure out that there is a bad VLAN
>> mirror.  You don't want argus to make that call (now that would be very
>> complicated).
>> 
>> The goal is to not over count flow metrics, to generate data that
>> reflects what is really going on on the wire.  So we need to be able
>> to have a real correction mechanism that doesn't skew the data.
>> 
>> Not sure that duplicate packets in the 1 millisecond time frame is good
>> enough.  RTT can be shorter than 1 mSec in small workgroups, and so
>> legitimate retransmissions may trip up something.  RTT is good, in this
>> case, as it gives you a real value for the test.
>> 
>> 
>> Carter
>> 
>> 
>>> 
>>> /Elof
>>> 
>>> 
>>> On Wed, 9 Oct 2013, Carter Bullard wrote:
>>>> There is a lot more to this than your response would indicate.
>>>> 
>>>> There isn't anything in a packet that distinguishes a
>>>> retransmission from the original.
>>>> 
>>>> Same can apply to dups, but
>>>> generally dups are different.  They have different L2 identifiers,
>>>> or they are (or can be) in different VLANs, or they are in
>>>> different MPLS or GRE tunnels, etc...
>>>> 
>>>> The content of a packet retransmission is identical in every
>>>> way to the original packet.  As long as the network treats the
>>>> original and the retransmission in the same way (path, priority),
>>>> the Po will be identical to Pr.  Po == Pr where Po is the
>>>> original packet and Pr is the retransmitted packet.
>>>> 
>>>> Retransmissions occur only because the original sender decides to
>>>> send a packet again.  For protocols like TCP, this requires a full
>>>> round-trip time to occur before the sender can realize that the
>>>> packet didn't get to the far side.  So the time between the original
>>>> packet and the retransmission must be greater than the round-trip time
>>>> of the network connection.
>>>> 
>>>> Dups, however, generally appear due to 3 reasons.
>>>> 
>>>> 1. the traffic path goes past the same observation point multiple times
>>>>      the same packet goes by multiple times.
>>>> 
>>>> 2. the network duplicates a packet, so for reliability or multicasting..
>>>>      two or more copies of the same packet exist in the network at the same time
>>>> 
>>>> 3. the collection infrastructure generates multiple copies of a single packet
>>>>      one packet in the network, but port mirroring generates multiple copies
>>>> 
>>>> In some situations, its easy to distinguish the dups, especially in case 1.
>>>> The IP time to live field may have changed if a router is involved, or
>>>> new source and/or destination ethernet addresses are in the header, or the
>>>> packet is on the same wire twice, but in different services, like VLANs
>>>> or tunnels.  Argus can discriminate these types of duplicates, through
>>>> modification of the flow keys (5-TUPLE+L2+VLAN+MPLS).
>>>> 
>>>> Well anyway, this is just the start of the description.  It can be much
>>>> more complicated that this.
>>>> 
>>>> 
>>>> Now with regard to gaps…. Gaps are where argus doesn't see all the packets
>>>> in a flow.  This happens when there is loss in the collection system,
>>>> packet was on the wire, but it didn't get to argus for some reason,
>>>> OR when there is stripping, or load balancing and your argus only sees
>>>> 50%, 33%, or 25% of the packets in a flow.  TCP indicates that there
>>>> were 10000 bytes transferred, but you only observed 5000 bytes.
>>>> 
>>>> This is important, and we get it for free, because we're trying to
>>>> figure out the loss rate.
>>>> 
>>>> Carter
>>>> 
>>>> 
>>>> On Oct 9, 2013, at 10:20 AM, elof2 at sentor.se wrote:
>>>> 
>>>>> On Tue, 1 Oct 2013, Carter Bullard wrote:
>>>>> 
>>>>>> Well, hmmmmmmm…. Everyone else wants to do de-duping of the packet stream.
>>>>>> Why would you want to be different from everyone else ?????   ;O)
>>>>> 
>>>>> I'm curious. What exactly is it that everyone wants (or not wants)?
>>>>> 
>>>>> 
>>>>> 
>>>>>> The strategy is to differentiate loss, retrans and dups, and report
>>>>>> them as independent metrics, with loss being observable loss, retrans being observable duplicates, and dups (for TCP) being retrans arriving in less than an RTT.
>>>>> 
>>>>> I don't fully agree about the dups.
>>>>> A dupe is, in my opinion, an *exact* copy of the original packet.
>>>>> A retransmission is not a dupe, it is a new packet, crafted because the original supposedly got lost.
>>>>> 
>>>>> Therefore, the logic need not be so expensive. If the very next packet is identical to the last packet and it was received within a microsecond from the last packet then it is a dupe.
>>>>> 
>>>>> Taking the RTT into consideration seem a bit excessive for the simple task of dupe recognition. Also, a RTT is not always possible to calulate if the flow only consist of a single SYN, or is just unidirectional UDP traffic, etc, but the packets of the flow are still duplicated.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> We still will need to derive gaps, which are lost packets that were not
>>>>> retransmitted.
>>>>> 
>>>>> Oooh, are you talking about distinguishing between external loss and internal loss?
>>>>> When argus see a gap in tcp sequence numbers you know there has been a drop, but not where it occurred.
>>>>> If argus then see a tcp retransmission for that gap, we know the drop was external, otherwise it was probably internal.
>>>>> That kind of logic seem expensive. If it *is* really expensive, I would say don't do it. Only do the first gap detection (as you already do today) and leave the task of understanding where the drops occurr to the user.
>>>>> 
>>>>> /Elof
>>>>> 
>>>>> 
>>>> 
>> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131016/24c9e54f/attachment.bin>