Regarding the gap detection bugs

elof2 at sentor.se elof2 at sentor.se
Tue Jun 30 06:30:52 EDT 2015


> So, to refresh my memory, we’re reporting gaps when there is TCP
> port reuse.  Old TCP is done, new TCP uses same ports and we
> don’t adjust the gap reporting ???

Yes, see issue 1 below.

(also see our previous emails debugging this. E.g. the thread "Errors in 
gap detection" where I run argus -S 0.0000001 to follow all the changes in 
state and tcp seq tracking.)


You got the pcap, containing a single tcp connection. If you replay this 
tcp connection two or more times, ra suddently says there are 2994 
src-bytes missing.





PS. When you review and update the gaps source code, also see issues 2 and 
3 below.

Perhaps these should be split into separate mail threads?
This thread currently contain:

- Part 1: Weird gap values (Reset the tcp cache upon reuse of TCP ports)
- Part 2: Incorrect gaps detected (Delay flush with corrected values)
- Part 3: Gap values are off by -1 byte

/Elof


> ---------- Forwarded message ----------
> From: elof2 at sentor.se
> To: Argus Development <argus-info at lists.andrew.cmu.edu>
> Date: Tue, 26 May 2015 21:34:34 +0200 (CEST)
> Subject: [ARGUS] Bugs in gap detection
>
>
> Hi Carter!
>
> I've just upgraded argus to the latest version on my FreeBSD-machine.
>
> Three bug reports:
>
>
>
> Issue 1)
> Problem: Argus detect gaps when there are none.
>
> ra -Zb -nr ch119_no_gap.log -s +sgap +dgap -
>      StartTime      Flgs  Proto            SrcAddr  Sport Dir DstAddr  Dport SrcPkts  DstPkts     SrcBytes     DstBytes         State SrcGap   DstGap
> 11:36:24.160011  e           tcp        10.123.45.5.25768   -> 10.23.23.10.8080 4        3         1037          715 SPA_SPA        0        0
> 11:36:48.322557  e           tcp        10.123.45.5.25768   -> 10.23.23.10.8080 38       81        3201       100099 PA_PA        0        0
> 11:37:54.333394  e g         tcp        10.123.45.5.25768  <?> 10.23.23.10.8080 8        8         7705         1809 PA_PA     2994        0
>
> So, on that last line, argus says there are 2994 src-bytes missing.
>
> 'netstat -B' (on my FreeBSD) show that the argus process has zero dropped packets.
> I ran a tcpdump simultaneously and this tcpdump process also had zero drops.
> Therefore I expect to see a few missing packets for the above flow in the pcap, as reported by argus.
>
> To my surprise, Wireshark does NOT complain about any missing packets at all!
>
> I have manually compared the three ra-lines to the pcap:
> I see the 4 srcpkts (1037 srcbytes) + 3 dstpkts (715 dstbytes) in the pcap.
> I see the 38 (3201) + 81 packets in the pcap.
> I see the 8 (7705) + 8 (1809) packets in the pcap.
> Everything looks good.
> Both argus and tcpdump saw a total of 142 packets.
>
> So... I guess there's something wrong with the gap detection in argus. :-/
>
> I'm sending you a private email, attaching both the pcap and the argus log for further debugging.
>
> note: the last two packets from the middle line are two window updates. Then there is a pause for 1 min 6 seconds before those last 8+8 packets at 11:37:54.
> Could the window updates or the >1 minute of silence fool argus into thinking there are gaps?
>
>
>
>
> Issue 2)
>
> Problem: Argus reports gaps when there are none.
> This must be because argus flush its state just before the missing data was retransmitted on the wire.
>
> Example:
> I follow a tcp connection until I see a gap...
>      StartTime      Flgs  Proto            SrcAddr  Sport Dir DstAddr  Dport SrcPkts  DstPkts     SrcBytes     DstBytes         State LastTime   SrcGap DstGap
> 11:36:05.719501  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 69       43        88147         2958 PA_A    11:36:10.862907        0        0
> 11:36:10.962535  e           tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 71       42        88894         2772 PA_A    11:36:16.111922        0        0
> 11:36:16.238263  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 71       46        88507         3084 PA_A    11:36:21.338720        0        0
> 11:36:21.450236  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 71       44        87447         2964 PA_A    11:36:26.560071        0        0
> 11:36:26.649754  e g         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 71       43        87160         2874 PA_A    11:36:31.751251      857        0
>  Ok, so there are 71 packets from 123.123.123.88:8800 in the line above. So somehere within these 71 packets there's a gap.
>
>  In my tcpdump pcap I start counting from the packet with timestamp 11:36:26.649754.
>  Between src packets 68 and 69 there's a missing packet.
>  After 69, packets 70 and 71 continue in the same high speed, where packet 71 has a timestamp of 11:36.31.751249.
>  LastTime of this line is 2us later, 11:36:31.751251 and this correspond fine with the timestamp of the 43rd packet from 10.123.1.5.
>  So far so good.
>
>  Now, in the second next packet (packet 73), a retransmission of the missing packet is received by argus.
>  So there are no *actual* gap in the tcp stream. The gap has been filled.
>  However, since argus has already flushed the 71+43 packets, packets 72 and 73 instead became packets 1 and 2 of the next line.
>  A SrcGap of 857 is therefore logged on the previous line.
>
> 11:36:31.821865  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 72       42        89425         2784 PA_A    11:36:37.012968        0        0
> 11:36:37.065894  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 72       45        87654         3030 PA_A    11:36:42.139706        0        0
> 11:36:42.220030  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 74       49        91817         3342 PA_A    11:36:47.361654        0        0
> 11:36:47.497642  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 69       43        88775         2934 PA_A    11:36:52.636345        0        0
> 11:36:52.688988  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 72       47        87834         3210 PA_A    11:36:57.840345        0        0
> 11:36:57.914371  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 70       43        88106         2934 PA_A    11:37:03.012641        0        0
> 11:37:03.143249  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 71       44        86827         2988 PA_A    11:37:08.241077        0        0
> 11:37:08.373448  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 75       48        89773         3264 PA_A    11:37:13.467642        0        0
> 11:37:13.555818  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 70       46        87370         3144 PA_A    11:37:18.679088        0        0
> 11:37:18.765883  e           tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 73       42        87613         2772 PA_A    11:37:23.814428        0        0
> 11:37:23.901961  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 71       47        89087         3222 PA_A    11:37:29.082170        0        0
> 11:37:29.258747  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 69       45        88978         3094 PA_A    11:37:34.375937        0        0
> 11:37:34.427460  e g         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 69       43        87253         2874 PA_A    11:37:39.516085       30        0
>
>  Again, there was a gap, with a retransmission just a few packets later, but too late, so it was counted in the next flow line and hence generated a 30 byte gap.
>
> 11:37:39.637529  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 72       44        87799         3000 PA_A    11:37:44.736910        0        0
> 11:37:44.831350  e s         tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 72       44        88953         2964 PA_A    11:37:49.980516        0        0
> 11:37:50.102882  e           tcp     123.123.114.88.8800   <?> 10.123.1.5.25699 76       42        89443         2772 PA_A    11:37:55.259373        0        0
>
>
>
>
> I think this is incorrect. Argus should only list *true* gaps detected in the flows. The retransmission flags show where there are retransmissions.
>
> So currently, argus reports gaps even though there physically are none. This makes the gaps field less useful.
>
> You can probably come up with a solution to this. Here's my suggestion:
> Delay the flush if there are active gaps in this unflushed flow.
> if (srcgap!=0 || dstgap!=0)
> {
> Wait up to 10 seconds for the missing packet to arrive (and hence fill the gap). Then flush the line with gap values set to zero.
> OR if the session is finished (FIN) or reset (RST), then flush the line immediately with the current gap values.
> You could probably build a third case too, in order to exit the delay loop early. Something like this:
> OR wait until N next packets for this stream has been received, where N is set to a value that covers the current sliding window filled with some typical packetsize plus a few packets more just for margin.
> That is, only wait for as long it would take the Src and Dst to retransmit the missing packet. If argus hasn't seen it by then, the packet will never appear, so stop waiting for it and flush the line with the current gap values.
> }
>
> Benefit:
> Argus show gap values you can trust and understand. I.e. if argus says zero gaps, that mean that the SPAN is good. No packet loss in any external equipment nor locally in the sniffer itself.
>
> Drawback:
> Sometimes the flushpoint can be extended by up to 10 seconds if there are gaps, instead of always be ARGUS_FLOW_STATUS_INTERVAL long.
> In my opinion, this is not a problem.
>
> (Perhaps you can build an even better delayed_flush() function than the above. One that flush all lines periodically as usual, but with corrected gap values, if the gap was filled later with a retransmission.)
>
>
>
>
> Issue 3)
>
> Problem: Argus seem to be 1 byte off when reporting the size of gaps.
>
> Example:
>
> 11:36:26.649754  e g         tcp     123.123.123.88.8800   <?> 10.123.1.5.25699 71       43        87160         2874 PA_A      857        0
>  seq analysis in Wireshark show:
>  packet 68 show Next seq=244713
>  packet 69 has seq=245571
>  245571-244713=858 bytes missing, not 857.
>
> 11:37:34.427460  e g         tcp     123.123.123.88.8800   <?> 10.123.1.5.25699 69       43        87253         2874 PA_A       30        0
>  seq 1334074-1334043=31, not 30 bytes.
>
>
> /Elof
>


More information about the argus mailing list