Another vote for packet drop detection

Carter Bullard carter at qosient.com
Wed Feb 29 12:17:01 EST 2012


Hey Charles,
Excellent, glad to hear that its working at some level.  I've also gotten some immediate
wins from the 'g' field in my own use, so I'm very glad that the group got this going.

The 'g' flag has a number of conditionals around it that I didn't put in the 'sgap' and 'dgap'
metric printing.  I'll put that logic into the client pass, which is going out tonight.

No filtering yet on sgap and dgap.  I think I put the keywords into the compiler, but if I did,
there isn't any actual logic there yet, so you won't get anything gratifying.  I was waiting
for some validation that we were going down the right path.  It may take a few more
turns, but I'll try to have filtering in soon.

There are now 2 issues that I need to work out.   The first is how to deal with aggregation.
If you aggregate independent TCP flows together, now, we may report a gap in the sequence
numbers, in the new flow record.  Sounds logical, but not correct for our purposes. I need
to get the gap values such that as I aggregate, the gap counters remain somewhat valid.

The second is how to report when this flow status record saw packets that filled a gap
reported in a previous flow status record.  We do this now in loss reporting.  We have
correction logic in argus that tells us that we saw packets in this status record that were 
reported as loss in the previous status record.  We report the metric as negative loss.  For
the gaps, we should be able to do the same thing, but I need to work out some of
the logic ( not sure I have all the data yet ).

Carter


On Feb 29, 2012, at 9:59 AM, Charles Smutz wrote:

> 
> Carter,
> 
> I've been very pleased with the packet loss (or rather gap) detection. In my experience, it works very well. It is possible to now infer problems with network monitor visibility (even tapping issues, etc outside the monitor) with a pretty high degree of confidence. Furthermore, you have some ability to quantify that loss. Even though you can only detect some of the packets that are not seen by network monitor but were transferred through network you can extrapolate if you want (assuming the loss is randomly distributed). Lastly, that you can do this without any special effort on the collection end, is just awesome. This opens the door for continuous monitoring of this facet of network monitor visibility with little cost.
> 
> A couple notes:
> -I've found the "g" flag to be reliable, but I've found the sgap and dgap to have erroneous values (very infrequently). Not a big issue, but wanted to point it out. See example below.
> -The logic for flagging gaps seems solid--in my evaluation the vast majority of the flows flagged with "g" do involve a packet that was transferred through network, but not seen by the monitor. I've found a very few where I believe the "g" flag is erroneous, but those have been relatively few and I'll admit determining ground truth in most situations can be pretty difficult.
> -to date (3.0.5.34), I can't seem to get filtering based sgap/dgap to work. This isn't a big issue to me, and may well be my fault, but wanted to point out.
> 
> Thanks,
> 
> Charles Smutz
> 
> Examples taken from test traces here using argus defaults:
> https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6081
> 
> Example showing erroneous gap values (note that flag is not g):
> $ ra -nn -s flgs +sbytes +dbytes +sgap +dgap -r test-B_trace-1_network-side-packet-loss.argus | head -n 7
> e s              1405         2912        0        0
> e                3920        14286        0        0
> e s              3419         5847        0     4848
> e d               148         3599        0        0
> e d               549         3216        0        0
> e                 330          909        0        0
> e                 330         1054        0        0
> 
> 
> Example showing my inability to filter:
> $ ra -nn -s flgs +sbytes +dbytes +sgap +dgap -r test-B_trace-1_network-side-packet-loss.argus -- "dst bytes gt 0" | head -n 7
> e s              1405         2912        0        0
> e                3920        14286        0        0
> e s              3419         5847        0     4848
> e d               148         3599        0        0
> e d               549         3216        0        0
> e                 330          909        0        0
> e                 330         1054        0        0
> $ ra -nn -s flgs +sbytes +dbytes +sgap +dgap -r test-B_trace-1_network-side-packet-loss.argus -- "dst gap gt 0" | head -n 7
> ra[8308]: 09:46:25.470929 dst gap gt 0 filter syntax error
> $ ra -nn -s flgs +sbytes +dbytes +sgap +dgap -r test-B_trace-1_network-side-packet-loss.argus -- "dst gaps gt 0" | head -n 7
> ra[8311]: 09:46:28.365700 dst gaps gt 0 filter syntax error
> 
> 
> 
> 
> 
> 
> On 2/1/2012 10:45 PM, argus-info-request at lists.andrew.cmu.edu wrote:
>> Send Argus-info mailing list submissions to
>> 	argus-info at lists.andrew.cmu.edu
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	https://lists.andrew.cmu.edu/mailman/listinfo/argus-info
>> or, via email, send a message with subject or body 'help' to
>> 	argus-info-request at lists.andrew.cmu.edu
>> 
>> You can reach the person managing the list at
>> 	argus-info-owner at lists.andrew.cmu.edu
>> 
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Argus-info digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Re:  Another vote for packet drop detection (Carter Bullard)
>>    2.  argus-3.0.5.10 and argus-clients-3.0.5.31 (Carter Bullard)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Wed, 1 Feb 2012 15:22:37 -0500
>> From: Carter Bullard<carter at qosient.com>
>> Subject: Re: [ARGUS] Another vote for packet drop detection
>> To: Argus<argus-info at lists.andrew.cmu.edu>
>> Message-ID:<AB1A8208-C06B-4DF3-8689-6568ADB1ADCA at qosient.com>
>> Content-Type: text/plain; charset="windows-1252"
>> 
>> Gentle people,
>> As I test the algorithms for gap detection, a few things are coming up, causing
>> me to simplify the strategy a bit.
>> 
>> There are two situations for TCP, and any other connection oriented transport
>> protocol.  When we are seeing both sides of the conversation, and when we
>> are not.  When we are seeing both sides, we can correlate received ACK's with
>> transmitted sequence numbers, and tick a status bit saying we've observed gaps
>> if there are discrepancies.   This will be very sensitive, but the condition itself
>> is not enough to know how much data was missed.
>> 
>> When we are seeing only one side, we can report gaps in the sequence numbers,
>> just by knowing the sequence number range, and comparing that against the
>> observed bytes.  As long as we account for sequence numbers out of order
>> this works very well.  However, when there are retransmissions, this method
>> breaks down.  Our observed bytes within a sequence number range can't
>> really be compared to give a realistic value, unless we are tracking retransmitted
>> bytes.  Currently argus does not track this metric, it tracks numbers of retransmitted
>> packets.  I'll change this for argus-3.1.0.
>> 
>> As a result, we will use the uni-directional algorithm to report gaps as unobserved
>> bytes, when there is no evidence of retransmissions.  I'm going to put a ' g ' in the
>> status flags field in the same column (#4) as the loss indicators,  when there are gaps.
>> I won't worry about the indication from argus, until argus-3.1.0 when we can change
>> the "retrans" metrics and their reporting.
>> 
>> Fields are called 'sgap' and 'dgap', and the Column labels are "SrcGap" and "DstGap".
>> There is support for printing, including xml, graphing, etc?. but there is no support
>> yet for filtering or sorting yet.  That is important, so I'll put that in before
>> I release the code.  Should be today or tomorrow.
>> 
>> The other protocol that we should implement this for is RTP.  Because RTCP, the
>> reverse control channel for RTP, contains loss values, we can compare the two
>> to realize that we are not seeing all the RTP traffic, but RTCP is not reporting loss.
>> This would be our indication of gaps in the RTP stream.  I'll work on that much later.
>> 
>> Carter
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120201/c057c591/attachment-0001.html
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: smime.p7s
>> Type: application/pkcs7-signature
>> Size: 4367 bytes
>> Desc: not available
>> Url : https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120201/c057c591/attachment-0001.bin
>> 
>> ------------------------------
>> 
>> Message: 2
>> Date: Wed, 1 Feb 2012 22:45:31 -0500
>> From: Carter Bullard<carter at qosient.com>
>> Subject: [ARGUS] argus-3.0.5.10 and argus-clients-3.0.5.31
>> To: Argus<argus-info at lists.andrew.cmu.edu>
>> Message-ID:<A08D4D55-2D66-403F-BCD2-C3364BD39483 at qosient.com>
>> Content-Type: text/plain; charset="windows-1252"
>> 
>> Gentle people,
>> New argus and argus-client software is now available on the server:
>> 
>>    http://qosient.com/argus/dev/argus-latest.tar.gz
>>    http://qosient.com/argus/dev/argus-clilents-latest.tar.gz
>> 
>> These packages have a number of bug fixes and new features, as discussed
>> on the mailing list.  Argus implements new support for gap tracking and
>> reporting, and fixes a few bugs in loss reporting, and TCP byte accountability
>> for when flows exceed 4GB.  Clients now can report on "sgap" and "dgap"
>> metrics.  I still need to implement gap filtering.  For argus data earlier than
>> argus-3.0.5.30, gap reporting is not supported.   ra* programs should do
>> the right thing.  I tested with argus-3.0.5.30, argus-3.0.4 and argus-2.x data.
>> 
>> There are a few bugs not on the mailing list that have also
>> been fixed, particularly, printing user buffers larger than 300+ bytes.
>> This would cause a core dump.
>> 
>> Manpages, and usage output when you run a program with the -h option
>> should be up to date now.  New manpages are available, and most have
>> been updated.
>> 
>> Please take a look at these packages, and provide feedback, if you
>> notice anything a miss.
>> 
>> Thanks for all the support,
>> 
>> Carter
>> 
>> 
>> On Feb 1, 2012, at 3:22 PM, Carter Bullard wrote:
>> 
>>> Gentle people,
>>> As I test the algorithms for gap detection, a few things are coming up, causing
>>> me to simplify the strategy a bit.
>>> 
>>> There are two situations for TCP, and any other connection oriented transport
>>> protocol.  When we are seeing both sides of the conversation, and when we
>>> are not.  When we are seeing both sides, we can correlate received ACK's with
>>> transmitted sequence numbers, and tick a status bit saying we've observed gaps
>>> if there are discrepancies.   This will be very sensitive, but the condition itself
>>> is not enough to know how much data was missed.
>>> 
>>> When we are seeing only one side, we can report gaps in the sequence numbers,
>>> just by knowing the sequence number range, and comparing that against the
>>> observed bytes.  As long as we account for sequence numbers out of order
>>> this works very well.  However, when there are retransmissions, this method
>>> breaks down.  Our observed bytes within a sequence number range can't
>>> really be compared to give a realistic value, unless we are tracking retransmitted
>>> bytes.  Currently argus does not track this metric, it tracks numbers of retransmitted
>>> packets.  I'll change this for argus-3.1.0.
>>> 
>>> As a result, we will use the uni-directional algorithm to report gaps as unobserved
>>> bytes, when there is no evidence of retransmissions.  I'm going to put a ' g ' in the
>>> status flags field in the same column (#4) as the loss indicators,  when there are gaps.
>>> I won't worry about the indication from argus, until argus-3.1.0 when we can change
>>> the "retrans" metrics and their reporting.
>>> 
>>> Fields are called 'sgap' and 'dgap', and the Column labels are "SrcGap" and "DstGap".
>>> There is support for printing, including xml, graphing, etc?. but there is no support
>>> yet for filtering or sorting yet.  That is important, so I'll put that in before
>>> I release the code.  Should be today or tomorrow.
>>> 
>>> The other protocol that we should implement this for is RTP.  Because RTCP, the
>>> reverse control channel for RTP, contains loss values, we can compare the two
>>> to realize that we are not seeing all the RTP traffic, but RTCP is not reporting loss.
>>> This would be our indication of gaps in the RTP stream.  I'll work on that much later.
>>> 
>>> Carter
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120201/40cf8817/attachment.html
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: smime.p7s
>> Type: application/pkcs7-signature
>> Size: 4367 bytes
>> Desc: not available
>> Url : https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120201/40cf8817/attachment.bin
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> Argus-info mailing list
>> Argus-info at lists.andrew.cmu.edu
>> https://lists.andrew.cmu.edu/mailman/listinfo/argus-info
>> 
>> 
>> End of Argus-info Digest, Vol 78, Issue 2
>> *****************************************
>> 
>> .
>> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120229/0b025276/attachment.bin>


More information about the argus mailing list