Errors in gap detection

Carter Bullard carter at qosient.com
Sun Oct 26 12:03:34 EDT 2014


Idle field is not something you will want to use, in this case.
That is intended for tools like ratop.1 so you can see how
the idle time is growing as nothing is coming in .

Since both flows are reporting the same number, my guess is that its algorithmic,
but there is more to this.  So those flows are not the same set of packets, as
the stime is off and the base sequence numbers are not the same ????

Can you print the duration ???  What is your ARGUS_MAR_STAUS_INTERVAL ???

So are any of the packets out of order ???  And you aren’t aggregating any of these
records, that could be cause some the problems ???  Print out the ‘trans’ field.

No way you can share a tcpdump of just one flows that has the gaps problem ???

So this email has too many issues in it.  We can talk about each independently.
Argus should not be holding onto flows, it is not state based.  And killing argus
doesn’t cause argus to flush, it causes the kernel to flush its buffers to
disk, because we close the file.  We don’t flush the output socket to disk, as its very
expensive.


> On Oct 23, 2014, at 12:16 PM, elof2 at sentor.se wrote:
> 
> 
> Hi!
> 
> The crazy flow look like this with idle, stcpb and dtcpb:
> 
> ra -Zb -s stime:9 flgs saddr sport dir:3 daddr dport spkts dpkts state:13 sgap dgap idle:14 stcpb dtcpb -nr gaps.argus - tcp
> 13:42:43.511666  e g          x.x.x.x.1087    ->     y.y.y.y.443          56      101     FSPA_FSPA 11274535 1767532*  1414077184.00*   1640464994   1368438167
> 13:42:43.511743  e g          x.x.x.x.1087    -> 10.10.10.10.443          56      101     FSPA_FSPA 11274535 1767532*  1414077184.00*   1640464994   1368438167
> 
> Apparently the network SPAN is setup to both mirror the outside vlan (before NAT:ing y.y.y.y to 10.10.10.10) and the inside vlan. So argus see both flows on its sniffer interface.
> Both flows were caught in my pcap. There are no gaps in either of them. I see how x.x.x.x do a threeway handshake towards y.y.y.y. Then I see how the inside if the NAT-fw send its SYN to 10.10.10.10. The two sessions are happening at the same time, not sequentially.
> Ra show the crazy gap values for both flows.
> 
> 
> 
> The same two flows looks like this when I replayed the full pcap (20 seconds of traffic):
> 
> 17:41:56.413382  e            x.x.x.x.1087    ->     y.y.y.y.443          56      101     FSPA_FSPA        0        0  1414079232.00*   1651739529   1545191373
> 17:41:56.413462  e            x.x.x.x.1087    -> 10.10.10.10.443          56      101     FSPA_FSPA        0        0  1414079232.00*   1651739529   1545191373
> 
> 
> I restarted argus and replayed a filtered pcap which only contained one of the flows (all 157 packets).
> out.log is only 128 bytes and growing 128 bytes every minute due to the MAR-status events.
> I've waited more than 5 minutes and the single flow is still not flushed to the file.
> I don't know if this is due to the fact that no more data at all is coming in on the sniffer interface, or if argus didn't realize that the connection has finished (the FIN packets are sent).
> Anyhow, I kill the argus daemon to force it to flush the data into out.log.
> 
> 17:52:04.561207  e            x.x.x.x.1087    -> 10.10.10.10.443          55      101     FSPA_FSPA        0        0  1414080256.00*   1651739529   1545191373
> 
> 
> So in both pcap cases, there are zero gaps reported, and the base sequence numbers are the same.
> However, the sequence numbers do NOT match the ones from the long-running argus daemon.
> 
> 
> 
> 
> Side-step:
> Regarding the non-flushing of the single flow to out.log. Is that a bug or work as intended?
> 
> /Elof
> 
> 
> 
> On Thu, 23 Oct 2014, Carter Bullard wrote:
> 
>> So what are the flow idle timeout values in your sensor(s) ???
>> Could it be really long, leaving older base sequence numbers
>> around, and we’re getting port reuse ???  Or it could be we’re
>> seeing sequence number rollover ???  What are the stcpb and
>> dtcpb for the flows that have crazy numbers ???
>> 
>> Carter
>> 
>> On Oct 23, 2014, at 9:41 AM, elof2 at sentor.se wrote:
>> 
>>> 
>>> Hi Carter!
>>> 
>>> Sorry, but I can't give you a pcap or argus-logfile since they contain sensitive data.
>>> 
>>> 
>>> I'm also sorry to say that this will probably be hard to debug.
>>> 
>>> 'cause when I replayed the pcap on another sensor, ra showed "0   0"
>>> gaps for this flow instead of "11274535 1767532*". This is correct, since I found no gaps in the flow in the pcap.
>>> 
>>> (the other flow I analysed show the correct "0     1367" since one packet is really missing in this flow in the pcap)
>>> 
>>> So a freshly started argus daemon seem to log correct values.
>>> 
>>> 
>>> I found another sensor with lots of crazy numbers.
>>> See attached logfile.
>>> 
>>> In there you can see that 1.2.3.4 is running a continous web-spider towards a wiki on 2.2.2.2:80.
>>> All the GET requests always generates flows with approximately 11 packes in each direction.
>>> There are zero gaps for days, and then suddenly there is a burst of crazy numbers during a period of 40 seconds. Then everything is good for two hours and then another 40 second burst.
>>> 
>>> 
>>> netstat -B show zero drops for the argus daemon.
>>> My graphs for the cpu usage, swapping, memory usage, packets per second, etc show nothing out of the ordinary. The machine is not heavily loaded. Doesn't swap. It only receives 35 Mbps of mirrored traffic.
>>> No spike or unusual activity during the 40 seconds of crazy numbers.
>>> 
>>> 
>>> I can't find any pattern or reason for the sudden burst of crazy numbers.
>>> 
>>> Other traffic flows show "0   0" gaps during the crazy periods, so not all flows are affected (not even all flows between 1.2.3.4 and 2.2.2.2, but many of them).
>>> 
>>> /Elof
>>> 
>>> 
>>> 
>>> 
>>> On Wed, 22 Oct 2014, Carter Bullard wrote:
>>> 
>>>> Can you send the pcap file ??  Does argus generale the crazy
>>>> numbers with this file ???
>>>> Carter
>>>> 
>>>>> On Oct 22, 2014, at 9:39 AM, elof2 at sentor.se wrote:
>>>>> 
>>>>> 
>>>>> Hi Carter!
>>>>> 
>>>>> FYI, the gap detection counters still show some wonky numbers in 3.0.8.
>>>>> 
>>>>> ra -Zb -s flgs spkts dpkts state:13 sgap dgap -nr gaps.argus - tcp | grep g
>>>>> 
>>>>> Normal and OK gaps show up like this:
>>>>>   Flgs  SrcPkts  DstPkts         State   SrcGap   DstGap
>>>>> e g            15       24         PA_PA        0      458
>>>>> e g             3        3         PA_PA        0      284
>>>>> e g             5        6         PA_PA        0      732
>>>>> e g           129      284         PA_PA        0     1367
>>>>> e g            66       94         PA_PA        0     1367
>>>>> e g             2        2         PA_PA        0      801
>>>>> 
>>>>> ...but here and there I get lines like this:
>>>>> e g             7        6     FSPA_FSPA 3051561* 8894044*
>>>>> e g             7        6     FSPA_FSPA 1142343* 8891853*
>>>>> e g             7        6     FSPA_FSPA 98000397 7385371*
>>>>> e g             6        4     FSPA_FSPA 59208794 6255514*
>>>>> e g            20       20     FSPA_FSPA 3468512*    65538
>>>>> e g            20       20     FSPA_FSPA 3468512*    65538
>>>>> e g             5        6       SPA_SPA 2562525* 9087142*
>>>>> e g             7        6     FSPA_FSPA 68629719 5826434*
>>>>> e g             7        6     FSPA_FSPA -214748* 5815425*
>>>>> e g             7        6     FSPA_FSPA -214748* 1919818*
>>>>> e g            17       24     FSPA_FSPA 3167486* 2765698*
>>>>> 
>>>>> This doesn't look as nice.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I tcpdump:ed traffic to pcap while argus created its logfile.
>>>>> I analysed two flows showing gaps, one normal with "0     1367" gaps and one wonky with "11274535 1767532*" gaps.
>>>>> 
>>>>> Wireshark analysis of the "normal flow" show identical numbers as argus; 0 gaps from src and 1367 bytes (one packet) missing (in mid-stream) from dst.
>>>>> Good.
>>>>> 
>>>>> Wireshark analysis of a "wonky" flow show no errors! No complaints at all from Wireshark (including its Expert Info). No "previous segment not captured" and no "ACKed unseen segment".
>>>>> Everything looks good in the pcap.
>>>>> I can't find any reason as to why argus create those wonky numbers.
>>>>> 
>>>>> 
>>>>> Oh, well, I don't use the gaps fields very often, so for me this is not important. I just thought I'd let you know.
>>>>> 
>>>>> /Elof
>>>>> 
>>> <gaps.txt>
>> 




More information about the argus mailing list