Regarding the gap detection bugs
elof2 at sentor.se
elof2 at sentor.se
Tue Jun 30 06:30:52 EDT 2015
> So, to refresh my memory, we’re reporting gaps when there is TCP
> port reuse. Old TCP is done, new TCP uses same ports and we
> don’t adjust the gap reporting ???
Yes, see issue 1 below.
(also see our previous emails debugging this. E.g. the thread "Errors in
gap detection" where I run argus -S 0.0000001 to follow all the changes in
state and tcp seq tracking.)
You got the pcap, containing a single tcp connection. If you replay this
tcp connection two or more times, ra suddently says there are 2994
src-bytes missing.
PS. When you review and update the gaps source code, also see issues 2 and
3 below.
Perhaps these should be split into separate mail threads?
This thread currently contain:
- Part 1: Weird gap values (Reset the tcp cache upon reuse of TCP ports)
- Part 2: Incorrect gaps detected (Delay flush with corrected values)
- Part 3: Gap values are off by -1 byte
/Elof
> ---------- Forwarded message ----------
> From: elof2 at sentor.se
> To: Argus Development <argus-info at lists.andrew.cmu.edu>
> Date: Tue, 26 May 2015 21:34:34 +0200 (CEST)
> Subject: [ARGUS] Bugs in gap detection
>
>
> Hi Carter!
>
> I've just upgraded argus to the latest version on my FreeBSD-machine.
>
> Three bug reports:
>
>
>
> Issue 1)
> Problem: Argus detect gaps when there are none.
>
> ra -Zb -nr ch119_no_gap.log -s +sgap +dgap -
> StartTime Flgs Proto SrcAddr Sport Dir DstAddr Dport SrcPkts DstPkts SrcBytes DstBytes State SrcGap DstGap
> 11:36:24.160011 e tcp 10.123.45.5.25768 -> 10.23.23.10.8080 4 3 1037 715 SPA_SPA 0 0
> 11:36:48.322557 e tcp 10.123.45.5.25768 -> 10.23.23.10.8080 38 81 3201 100099 PA_PA 0 0
> 11:37:54.333394 e g tcp 10.123.45.5.25768 <?> 10.23.23.10.8080 8 8 7705 1809 PA_PA 2994 0
>
> So, on that last line, argus says there are 2994 src-bytes missing.
>
> 'netstat -B' (on my FreeBSD) show that the argus process has zero dropped packets.
> I ran a tcpdump simultaneously and this tcpdump process also had zero drops.
> Therefore I expect to see a few missing packets for the above flow in the pcap, as reported by argus.
>
> To my surprise, Wireshark does NOT complain about any missing packets at all!
>
> I have manually compared the three ra-lines to the pcap:
> I see the 4 srcpkts (1037 srcbytes) + 3 dstpkts (715 dstbytes) in the pcap.
> I see the 38 (3201) + 81 packets in the pcap.
> I see the 8 (7705) + 8 (1809) packets in the pcap.
> Everything looks good.
> Both argus and tcpdump saw a total of 142 packets.
>
> So... I guess there's something wrong with the gap detection in argus. :-/
>
> I'm sending you a private email, attaching both the pcap and the argus log for further debugging.
>
> note: the last two packets from the middle line are two window updates. Then there is a pause for 1 min 6 seconds before those last 8+8 packets at 11:37:54.
> Could the window updates or the >1 minute of silence fool argus into thinking there are gaps?
>
>
>
>
> Issue 2)
>
> Problem: Argus reports gaps when there are none.
> This must be because argus flush its state just before the missing data was retransmitted on the wire.
>
> Example:
> I follow a tcp connection until I see a gap...
> StartTime Flgs Proto SrcAddr Sport Dir DstAddr Dport SrcPkts DstPkts SrcBytes DstBytes State LastTime SrcGap DstGap
> 11:36:05.719501 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 69 43 88147 2958 PA_A 11:36:10.862907 0 0
> 11:36:10.962535 e tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 71 42 88894 2772 PA_A 11:36:16.111922 0 0
> 11:36:16.238263 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 71 46 88507 3084 PA_A 11:36:21.338720 0 0
> 11:36:21.450236 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 71 44 87447 2964 PA_A 11:36:26.560071 0 0
> 11:36:26.649754 e g tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 71 43 87160 2874 PA_A 11:36:31.751251 857 0
> Ok, so there are 71 packets from 123.123.123.88:8800 in the line above. So somehere within these 71 packets there's a gap.
>
> In my tcpdump pcap I start counting from the packet with timestamp 11:36:26.649754.
> Between src packets 68 and 69 there's a missing packet.
> After 69, packets 70 and 71 continue in the same high speed, where packet 71 has a timestamp of 11:36.31.751249.
> LastTime of this line is 2us later, 11:36:31.751251 and this correspond fine with the timestamp of the 43rd packet from 10.123.1.5.
> So far so good.
>
> Now, in the second next packet (packet 73), a retransmission of the missing packet is received by argus.
> So there are no *actual* gap in the tcp stream. The gap has been filled.
> However, since argus has already flushed the 71+43 packets, packets 72 and 73 instead became packets 1 and 2 of the next line.
> A SrcGap of 857 is therefore logged on the previous line.
>
> 11:36:31.821865 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 72 42 89425 2784 PA_A 11:36:37.012968 0 0
> 11:36:37.065894 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 72 45 87654 3030 PA_A 11:36:42.139706 0 0
> 11:36:42.220030 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 74 49 91817 3342 PA_A 11:36:47.361654 0 0
> 11:36:47.497642 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 69 43 88775 2934 PA_A 11:36:52.636345 0 0
> 11:36:52.688988 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 72 47 87834 3210 PA_A 11:36:57.840345 0 0
> 11:36:57.914371 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 70 43 88106 2934 PA_A 11:37:03.012641 0 0
> 11:37:03.143249 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 71 44 86827 2988 PA_A 11:37:08.241077 0 0
> 11:37:08.373448 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 75 48 89773 3264 PA_A 11:37:13.467642 0 0
> 11:37:13.555818 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 70 46 87370 3144 PA_A 11:37:18.679088 0 0
> 11:37:18.765883 e tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 73 42 87613 2772 PA_A 11:37:23.814428 0 0
> 11:37:23.901961 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 71 47 89087 3222 PA_A 11:37:29.082170 0 0
> 11:37:29.258747 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 69 45 88978 3094 PA_A 11:37:34.375937 0 0
> 11:37:34.427460 e g tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 69 43 87253 2874 PA_A 11:37:39.516085 30 0
>
> Again, there was a gap, with a retransmission just a few packets later, but too late, so it was counted in the next flow line and hence generated a 30 byte gap.
>
> 11:37:39.637529 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 72 44 87799 3000 PA_A 11:37:44.736910 0 0
> 11:37:44.831350 e s tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 72 44 88953 2964 PA_A 11:37:49.980516 0 0
> 11:37:50.102882 e tcp 123.123.114.88.8800 <?> 10.123.1.5.25699 76 42 89443 2772 PA_A 11:37:55.259373 0 0
>
>
>
>
> I think this is incorrect. Argus should only list *true* gaps detected in the flows. The retransmission flags show where there are retransmissions.
>
> So currently, argus reports gaps even though there physically are none. This makes the gaps field less useful.
>
> You can probably come up with a solution to this. Here's my suggestion:
> Delay the flush if there are active gaps in this unflushed flow.
> if (srcgap!=0 || dstgap!=0)
> {
> Wait up to 10 seconds for the missing packet to arrive (and hence fill the gap). Then flush the line with gap values set to zero.
> OR if the session is finished (FIN) or reset (RST), then flush the line immediately with the current gap values.
> You could probably build a third case too, in order to exit the delay loop early. Something like this:
> OR wait until N next packets for this stream has been received, where N is set to a value that covers the current sliding window filled with some typical packetsize plus a few packets more just for margin.
> That is, only wait for as long it would take the Src and Dst to retransmit the missing packet. If argus hasn't seen it by then, the packet will never appear, so stop waiting for it and flush the line with the current gap values.
> }
>
> Benefit:
> Argus show gap values you can trust and understand. I.e. if argus says zero gaps, that mean that the SPAN is good. No packet loss in any external equipment nor locally in the sniffer itself.
>
> Drawback:
> Sometimes the flushpoint can be extended by up to 10 seconds if there are gaps, instead of always be ARGUS_FLOW_STATUS_INTERVAL long.
> In my opinion, this is not a problem.
>
> (Perhaps you can build an even better delayed_flush() function than the above. One that flush all lines periodically as usual, but with corrected gap values, if the gap was filled later with a retransmission.)
>
>
>
>
> Issue 3)
>
> Problem: Argus seem to be 1 byte off when reporting the size of gaps.
>
> Example:
>
> 11:36:26.649754 e g tcp 123.123.123.88.8800 <?> 10.123.1.5.25699 71 43 87160 2874 PA_A 857 0
> seq analysis in Wireshark show:
> packet 68 show Next seq=244713
> packet 69 has seq=245571
> 245571-244713=858 bytes missing, not 857.
>
> 11:37:34.427460 e g tcp 123.123.123.88.8800 <?> 10.123.1.5.25699 69 43 87253 2874 PA_A 30 0
> seq 1334074-1334043=31, not 30 bytes.
>
>
> /Elof
>
More information about the argus
mailing list