Help needed to interpret how flows are reported. Probably IPs are being swapped.

Carter Bullard carter at qosient.com
Mon Jun 15 12:25:43 EDT 2015


Hey Sebas,
All of your observations are correct but the behavior of argus is also correct, so there are not errors or bugs, as I see it. I have embedded notes inside your email as responses.

The short answer is that for TCP, argus determines direction based on protocol state, when possible, and “first to send” when it cannot ‘know' the direction.  

The key to realizing what argus is doing is the “ ? “ in the direction field.  That “ ? “ indicates that argus doesn’t know the party that initiated the connection.  In these cases argus uses the first packet seen to determine the direction of the TCP.  When caches timeout, due to long idle times, argus throws away its knowledge of connection initiation, and thus forgets who started the connection.  When the TCP wakes up and transmits, argus doesn’t have a cache for the flow, and must report the flow with a “ ? “ in the direction field.  

To avoid this problem in argus, you need to change the “ARGUS_TCP_TIMEOUT=“ variable in your argus.conf file to a very large value, one that exceeds your largest idle period.  This is impractical because that means you will hold on to all TCP caches for a very long time, consuming memory.  The best solution is to track these flows in a ra* client program.  The best being rasqlinsert(), which is a general aggregating client, just like racluster.1, but can use a mysql database table to hold the caches of the long lived flows rather than core memory.  Something like:

   rasqlinsert -S argus.source -M cache -f racluster.conf.file -w mysql://user@localhost/db/longlivedflows <mysql://user@localhost/db/longlivedflows>


The racluster.conf.file needs to be formulated to make the timeouts useful.  There may need to be some added logic for state based database flushing, but should work very well.

Carter


> On Jun 4, 2015, at 2:25 PM, el draco <eldraco at gmail.com> wrote:
> 
> Hi list.
> I would want your opinion about an issue with argus flows. Sorry for
> the long mail.
> 
> Analyzing the pcap file
> https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-120-1/2015-04-22_capture-win4.pcap
> I found some errors on how the flows are created.
> 
> The issue can be seen with the packets coming from IP 10.0.2.104
> source port 49227 to IP 147.32.83.57 dst port 5552. In the file there
> are two conversations with this combination of ips and ports, because
> the IP 10.0.2.104 at some point  re-uses the source port 49227 to
> start a new conversation. I'm attaching an image of how wireshark sees
> these two conversations.

Port reuse will be confusing when you have status intervals, that exceed
the TCP_FIN_WAIT_STATE_2 timer, which is normally 120s.  This time is used
to decide when ports can be reused without generating a connection
collision.  While argus knows that it sees a new SYN/SYNACK/ACK volley,
it still merges the two connection together as the default 5-tuple key
that you are using can’t distinguish the two flows.

> 
> The problem appears when we analyze it with argus.
> First, lest force argus to give us the correct answer. We should
> execute it with a large ARGUS_FLOW_STATUS_INTERVAL, such as 999999.
> 
> Correct Example 1:
> argus -S 999999 -r 2015-04-22_capture-win4.pcap -w - "host 10.0.2.104
> and port 49227 and host 147.32.83.57 and port 5552"|ra -n -r - -Z b
> StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes
> 1970/01/01 02:00:17.462571,576654.812500,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,SRPA_SPA,0,0,12015,819080
> 1970/01/07 18:11:12.286232,1078.610596,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,FSPA_FSPA,0,0,2752,435776
> 
> This is consistent with what wireshark said. Even the packet count and
> byte count. The only difference is in the duration of the first flow,
> that is longer for argus and may be related with the problem I'm
> discussing.
> 
> Now the strange part comes when we use another STATUS INTERVAL, such
> as 99999 (I found it by using 3600):
> 
> Strange Example 2:
> argus -S 99999 -r 2015-04-22_capture-win4.pcap -w - "host 10.0.2.104
> and port 49227 and host 147.32.83.57 and port 5552"|ra -n -r - -Z b
> StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes
> 1970/01/01 02:00:17.462571,99992.859375,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,SPA_SPA,0,0,5110,428506
> 1970/01/02 05:47:05.320456,99218.054688,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,PA_PA,0,0,4851,268422
> 1970/01/03 09:44:35.110556,39320.703125,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_RPA,0,0,2054,122152              <- Strange
> flow
> 1970/01/07 18:11:12.286232,1078.610596,tcp,10.0.2.104,49.227,
> ->,147.32.83.57,5552,FSPA_FSPA,0,0,2752,435776
> 
> The problem here is that one of the flows reported has the source IP
> address and the destination IP address changed!. It seems that it is
> one of the REPORTED flows in the middle of the conversation that was
> printed because the STATUS INTERVAL expired. Note that the second
> _real_ flow (starting at 18:11:12.286232) was printed correctly

When the “ ? “ is in the direction field, argus is saying it doesn’t know.
So the fact that it is not the same is reasonable.  Argus uses the ARGUS_TCP_TIMEOUT
field, which by default is 60 or 120 seconds, not quite sure from memory.
So after it reports the flow after 99999 seconds, 60 seconds later the
flow cache is timed out.

As you reported, racluster.1 understands this and merges the
flow records together correctly.

> 
> Actually all the _reported_ flows in the middle have this error. For
> example a STATUS INTERVAL of 59999 produces:
> 
> Strange Example 3:
> argus -S 59999 -r 2015-04-22_capture-win4.pcap -w - "host 10.0.2.104
> and port 49227 and host 147.32.83.57 and port 5552"|ra -n -r - -Z b
> StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes
> 1970/01/01 02:00:17.462571,59936.800781,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,SPA_SPA,0,0,3286,327578
> 1970/01/01 19:03:04.777008,58765.035156,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,2862,158364           <- Changed
> 1970/01/02 11:46:17.851689,59764.582031,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,2958,163676           <- Changed
> 1970/01/03 04:46:11.011105,57224.800781,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_RPA,0,0,2909,169462         <- Changed
> 1970/01/07 18:11:12.286232,1078.610596,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,FSPA_FSPA,0,0,2752,435776
> 
> 
> Moreover, if we keep decreasing the STATUS INTERVAL, to 19999 for
> example, we can see that the _firsts_ reported flows are correctly
> printed, but then something happens and they start to be reported with
> the IP swapped. Please see the difference between the second and third
> flow in the following example.

This is correct behavior, as the direction is reported with “ ? “.
Realize you have massively idle time periods between TCP activity.

> 
> Strange example 4:
> argus -S 19999 -r 2015-04-22_capture-win4.pcap -w - "host 10.0.2.104
> and port 49227 and host 147.32.83.57 and port 5552"|ra -n -r - -Z b
> StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes
> 1970/01/01 02:00:17.462571,19988.533203,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,SPA_SPA,0,0,1285,216856
> 1970/01/01 07:33:41.002302,19212.013672,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,PA_PA,0,0,948,52456               <- Last good
> reported flow
> 1970/01/01 13:17:40.348905,19293.916016,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,1053,58266           <- First swapped
> reported flow
> 1970/01/01 19:03:04.777008,19990.382812,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,1005,55610
> 1970/01/02 00:36:30.162685,19010.492188,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,900,49800
> 1970/01/02 06:17:08.524192,19989.390625,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,1008,55776
> 1970/01/02 11:50:32.922748,19977.240234,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,996,55112
> 1970/01/02 17:47:18.393173,19997.363281,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,1005,55610
> 1970/01/02 23:20:50.759786,19970.878906,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,999,55278
> 1970/01/03 05:17:30.232128,19984.367188,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,996,55112
> 1970/01/03 10:50:49.603860,19057.427734,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_PA,0,0,999,55278
> 1970/01/03 16:32:15.983657,14859.828125,tcp,147.32.83.57,5552,
> <?>,10.0.2.104,49227,PA_RPA,0,0,821,53926
> 1970/01/07 18:11:12.286232,1078.610596,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,FSPA_FSPA,0,0,2752,435776
> 
> I confirm the error because there are NO starting connections from 147.32.83.57:
> $ tcpdump -n -s0 -r 2015-04-22_capture-win4.pcap src host 147.32.83.57
> and "tcp[tcpflags] & (tcp-syn) != 0" and "tcp[tcpflags] & (tcp-ack) ==
> 0"
> reading from file 2015-04-22_capture-win4.pcap, link-type EN10MB (Ethernet)
> (no output)
> And there are 386 connections started from 10.0.2.104:
> $ tcpdump -n -s0 -r 2015-04-22_capture-win4.pcap src host 10.0.2.104
> and "tcp[tcpflags] & (tcp-syn) != 0" and "tcp[tcpflags] & (tcp-ack) ==
> 0" | wc -l
> reading from file 2015-04-22_capture-win4.pcap, link-type EN10MB (Ethernet)
> 386
> 
> 

This is neither strange nor incorrect.  There was a 5 hour idle time in the
data, and argus timed out the idle cache.  Again racluster.1 will correct the data.

> 
> There is something even more strange. If we create a new pcap file
> with only these packets, then the problem disappears.
> 1- tcpdump -n -s0 -r 2015-04-22_capture-win4.pcap host 10.0.2.104 and
> port 49227 and host 147.32.83.57 and port 5552 -w
> test-two-connections.pcap
> 2- argus -S 99999 -r test-two-connections.pcap -w - "host 10.0.2.104
> and port 49227 and host 147.32.83.57 and port 5552"|ra -n -r - -Z b
> StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes
> 1970/01/01 02:00:17.462571,99992.859375,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,SPA_SPA,0,0,5110,428506
> 1970/01/02 05:47:05.320456,99218.054688,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,PA_PA,0,0,4851,268422
> 1970/01/03 09:44:35.110556,39320.703125,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,RPA_PA,0,0,2054,122152
> 1970/01/07 18:11:12.286232,1078.610596,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,FSPA_FSPA,0,0,2752,435776
> 
> It seems that the problem is in how the original pcap file was created
> or a combination of packets and flows. I'm attaching this second pcap
> file for you.
> 
> 
> If I push the STATUS INTERVAL to be very small (0.0001), we can find
> the first real swapped flow:
> 1970/01/01 02:10:45.954851,0.000000,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,PA_,0,0,1,56         <- Good flow
> 1970/01/01 02:10:45.954957,0.000000,tcp,10.0.2.104,49227,
> ->,147.32.83.57,5552,_A,0,0,1,54         <- Good flow
> 1970/01/01 02:14:24.224735,0.000000,tcp,147.32.83.57,5552,
> ?>,10.0.2.104,49227,PA_,0,,1,74         <- Swapped flow
> 1970/01/01 02:14:24.225052,0.000000,tcp,147.32.83.57,5552,
> ?>,10.0.2.104,49227,PA_,0,,1,72         <- Swapped flow
> 1970/01/01 02:14:24.225161,0.000000,tcp,147.32.83.57,5552,
> <?,10.0.2.104,49227,_A,0,0,1,54         <- Swapped flow
> 
> it seems here like if argus is confused about the direction of the
> flow. In the first swapped flow the direction is ?> and then even <?
> or <?>.
> 
> Just for the sake of completeness, the first packet that is swapped in
> the pcap file seems to be
> 4407 846.762164 147.32.83.57 5552 10.0.2.104 49227 TCP 74 5552→49227
> [PSH, ACK] Seq=9928 Ack=10145 Win=65535 Len=20
> Packet number 4407 in the pcap file, started 846.762164 seconds after
> THAT TCP connection was created (I put the ref time in the first SYN
> of that connection).

Change the ARGUS_TCP_TIMEOUT variable in the argus.conf file.
> 
> 
> Do you have any idea what can be happening?
> Thanks for your time
> Sebas
> pd) Argus Version 3.0.8.1
> Ra Version 3.0.8
> 
> 
> -- 
> https://pgp.mit.edu/pks/lookup?op=get&search=0x9D9A358CA10F1601
> <wireshark-conversations-2.png><test-two-connections.pcap>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150615/724a1966/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150615/724a1966/attachment.bin>


More information about the argus mailing list