Question about loss and retransmission reporting

Balas, Edward G ebalas at iu.edu
Tue Dec 11 11:52:31 EST 2018


Carter,

Ive dug a bit deeper reviewing the original pcap that caught my eye.  The concern / confusion at this point is that most other utilities report similar values ~12,097 whereas Argus is reporting  value thats ~ 500% larger. 


1.  I ran tshark and ‘manually” counted multiple instances of sequence numbers when tcp.len is greater than 1 ie not an ack

accuracy-test]#  tshark -r no-vlan.pcap -nn -Y 'tcp.srcport==35940 and tcp.len > 1' -T fields -e ip.src -e ip.dst -e tcp.dstport  -e tcp.seq  | sort | uniq -dc | wc -l.
12087 

(this is unique instances not total count, though each sequence number showed up twice in spot checking)


2. next I looked at what tshark's own tcp analysis would result in. 
tshark -r no-vlan.pcap -nn -Y 'tcp.port==35940 and tcp.analysis.retransmission' -T fields -e ip.src -e ip.dst -e tcp.dstport  -e tcp.seq  | wc -l 
13610

-this number is is different to be sure but within same order of magnitude

3.  next we ran this pcap through tcptrace and it also reported 12087 

4. finally ran argus / racluster is showing the following:  

accuracy-test]# racluster -n -r no-vlan.argus   -M dsrs=+metric,+asn,+net,+agr,+ipattr  -s +trans, +cause, +retrans,+loss,+appbytes, dur
         StartTime      Flgs  Proto            SrcAddr  Sport   Dir            DstAddr  Dport  TotPkts   TotBytes State  Trans   Cause          Retrans Loss  TotAppByte        Dur 
   18:32:30.061403  e s         tcp       xxx.xxx.123.97.35940     ->      xxx.xxx.106.139.5201     309477 2089885325   RST      3   Start       0      66189 2068372573  10.294667

- it should be noted that the metrics for total app bytes, and total packets are consistent with other utilities and all were generated from the same pcap file.
- ra is unable to return retrans values even though in the source code loss is calculated from retrans?



The code path that trigged the 66,189 reported retransmissions is as follows:

ArgusTcp.c


                     if (ArgusThisTCPsrc->win) {
                        int dipid;
                        if (*tipid && ((dipid = (ipid - *tipid)) < 0) && (dipid > -5000)) {
                           ArgusThisTCPsrc->status |= ARGUS_OUTOFORDER;
                        } else
                        if (ArgusThisTCPsrc->winbytes > ((maxseq - 1) - ArgusThisTCPdst->ack)) {
                           ArgusThisTCPsrc->retrans++;
                           ArgusThisTCPsrc->status |= ARGUS_PKTS_RETRANS;
                           ArgusThisTCPsrc->winbytes -= len;
                           printf(“The code path we are hitting is here\n");
                           model->ArgusInProtocol = 0;

                        } else {


Im not super familiar with the source, could you share a bit as to whats going on in this situation?  and also any insights as to why loss is showing values but not retransmissions is always 0?



Thanks Much!




> On Dec 7, 2018, at 1:45 PM, Balas, Edward G <ebalas at iu.edu> wrote:
> 
> 
> 
>> On Dec 7, 2018, at 1:21 PM, carter at qosient.com wrote:
>> 
>> Hey Edward,
>> Its always important to indicate which version your running, so we can deal with known bugs etc …
> 
> Sorry about that… I hadn't gotten to point of thinking there was a bug …
> 3.0.8.2 is what I running now
> 
>> 
>> Argus tracks loss in 3 basic conditions, connection oriented protocols such as TCP,  connection-less protocols that have sequence numbers, such as RTP, UDT, and IPSEC, and strict request / response protocols where you should see the same out as back.
>> 
>> Argus has a complete TCP state machine so that it can identify requests for missing packets, retransmissions and out of order packets.  But argus is designed to recognize loss regardless of where it is along the path.  As a result, the algorithm is a little complex, mainly because TCP is reliable and regardless of the loss rate you should always see at least one copy of all the packets.  Because argus is a bi-directional flow monitor,  argus can do things like look for requests for retransmission as an indication of loss.  It can infer that observing multiple packets is an indication of loss (you don’t retransmit unless there was loss), and it can needs to do this in the event of stripping and asymmetric routing.
> 
> yeah we are defn interesting in the asymmetic situation, for now we are validating using  point of observation where traffic appears symmetric.
> 
>> 
>> Because loss can occur before and or after argus see’s the packet stream, argus will use retransmissions and retransmission requests from the far side as an indication of loss.  If the far side requests more than once, we assume that the packet was lost more than once, or that the retransmission request was lost.  This is a possible source for argus saying there is more loss than other tools.
> 
> Ah ok that is in part what I was curious about, so if one were to look at the individual packets in a flow and count the number of times any sequence number shows up more than once, they should presumably come up with the same value as argus loss calc?
> 
> 
>> Now with that as a starting point, where is argus in relationship with the the other tools, and what methods are they using to determine loss ???
> 
> Yeah this is a classic dilemma have having too many watches and not knowing which is correct.   What we have for a reference is ultimately what iperf3 is reporting, and we are starting to look at what the kernel can tell us as presumably the tcp implementation should have a pretty authoritiative view of its own behavior, but I havent dug into proc filesystem etc enough to know whats available.  Im not really sure yet what tsharks methodology is for determining retransmission, I have been presuming it is similar to what you described, Ill take a look at sequence numbers directly to see whats up and report back.
> 
>> 
>> Carter
>>          	 	
>> Carter Bullard  •  CTO
>> 150 E 57th Street, Suite 12D
>> New York, New York 10022-2795
>> Phone +1.212.588.9133 • Mobile +1.917.497.9494
>> 
>> 
>> 
>>> On Dec 7, 2018, at 12:28 PM, Balas, Edward G <ebalas at iu.edu> wrote:
>>> 
>>> Hey all,
>>> 
>>> Ive run into an issue Im struggling to understand, and thus far googling has failed to right me.  I am trying to use Argus to track retransmissions / loss in flows and I am getting values that are inconsistent with other tools including the sending application.  As I recurse into the various rabbit holes contributing to this on our end, I was wondering if someone could guide me on the following:
>>> 
>>> 1.  within ra etc there is the ability to report loss and retrans.  When I look at the documentation loss seems to imply it contains both retransmissions and dropped packets, if Im looking at a TCP flow, is it correct to assume there will be no drops and thus loss is synonymous with retransmission?
>>> 
>>> 2.  I am able to get ra and racluster to report loss values for my flows, however retrans is always 0, is there a special -M or other options or argus option I need to use to see retrans?  Im making the possibly bad assumption that because I can see loss values the tunings of argus are sufficient.
>>> 
>>> 3.  The Loss numbers are always higher than what I am seeing with other applications, is there a document or place in the code I should go look at that describes how this is calculated?
>>> 
>>> Motivating these questions is the following small test:
>>> -------------------------------------------------------
>>> I transfered a file to my test host while doing full snaplen packet capture, and then compared argus with tshark reports of loss and retransmission. 
>>> 
>>> accuracy-test2]#  argus -JA -r raw.pcap  -w raw.argus
>>> 
>>> accuracy-test2]#  racluster -n -r raw.argus  -s stime,dur,pkts,retrans,loss,appbytes,cause -- port 51170
>>>         StartTime        Dur  TotPkts Retrans       Loss TotAppByte   Cause 
>>>   16:47:58.176366  15.206044    21513       0         24   20195392   Start
>>> 
>>> 
>>> 
>>> 
>>>>> Total packets between tshark and argus agree:
>>> 
>>> accuracy-test2]#  tshark -r raw.pcap -nn -Y 'tcp.port==51170 '  | wc -l 
>>> 21513
>>> 
>>>>> Retransmissions / loss do not agree between tshark and argus:
>>> 
>>> accuracy-test2]#  tshark -r raw.pcap -nn -Y 'tcp.port==51170 and tcp.analysis.retransmission '  | wc -l 
>>> 17
>>> 
>>> 17 vs 24
>>> 
>>> 
>>> Was curious if folks had insights they could share in these regards?
>>> 
>>> Thanks,
>>> 
>>> Edward Balas
>>> ebalas at iu.edu
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20181211/dbde4c63/attachment.html>


More information about the argus mailing list