Question regarding on how to use flows in real time

Mon Jun 15 10:51:41 EDT 2015

Hi Carter, thanks for your answer.

> You should not be running argus with long status times.  You should be
> running argus with reasonable status times, say 5 seconds, and then
> aggregating the output of argus to suit your needs.
>
ok, and can you guide me on some specifics of why this is not a good thing
to do?

> So if you looked at argus output when the status times are reasonable, say
> 5 seconds, you will quickly see that roughly 95% of all IP flows are
> completed in around 2.5 second secs.  DNS is sub 1 second, usually, HTTP
> transactions are generally under 3 seconds, usually, with the majority
> under 1.45 seconds.  This is why we recommend a 5 second status interval.
>
I agree with normal traffic, but In my experience capturing malware,
traffic tend to have longer flows duration, for example:

StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes,SrcBytes
1970/01/01 01:20:53.681390,77.040024,tcp,10.0.2.108,49180,
->,23.227.199.38,80,SRPA_SPA,0,0,12,3034,2474                    (duration
77 seconds) (Miuref malware)
1970/01/01 03:20:30.830490,80.464714,tcp,10.0.2.108,49339,
->,23.227.199.38,80,SRPA_SPA,0,0,12,2864,2304                    (duration
80 seconds) (Miuref malware)
1970/01/01 01:00:08.370234,3480.944336,udp,fe80::705a:530f:1701:e8a5,546,
->,ff02::1:2,547,INT,0,,63,9198,9198                 (duration 3480
seconds) (Miuref malware)
1970/01/01 01:05:01.056919,951.775024,tcp,10.0.2.107,49165,
->,46.32.233.226,8080,FSPA_FSRPA,0,0,538,614191,6684     (duration 951
seconds)
1970/01/04 02:36:20.851627,901.359741,tcp,10.0.2.107,49575,
->,106.187.49.59,8080,FSPA_FSRPA,0,0,11,1351,759            (duration 901
seconds) (geodo malware)
This last one is a classic Geodo Botnet Command and Control channel.

(Captures
https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-126-1/2015-06-07_capture-win7.pcap
and
https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-127-1/2015-06-07_capture-win8.pcap
)

In the case of the Geodo capture, a quick summary of the duration of all
its flows is:
 Min.   :   0.000
 1st Qu.:   0.017
 Median :   1.553
 Mean   : 381.830
 3rd Qu.: 165.278
 Max.   :3599.897
So, although most of the durations are below 5 secs, (median 1.553) the
most important for me are those above the mean of 381.  (I'm attaching two
histograms for better visualization)

If you collect a days worth of flow records, and then use racluster.1 to
> merge all the flow records, you will see that racluster.1 does what you
> want, so it seems that you are looking for a streaming racluster.1 like
> program.
>
Yes, this is true. A streaming racluster is what I would need.

To help me understand, I make these test with a pcap file:
A: Traffic -> argus with 5s status report -> racluster
B: Traffic -> argus with 5s status report -> ra
C: Traffic -> argus with 3600s status report -> racluster
D: Traffic -> argus with 3600s status report -> ra

A and C are exactly the same, as you would expect.
D is very very very similar to A and C. Not quiet exact, but very close.
(only some arp and ipv6 udp traffic were different)
B of course is totally different as we expected.

I'm using D now, so I can move to racluster without problems.

The problem with holding flows and not reporting on them until they close,
> is that many long lived flows don’t ever close, by definition
>
I understand. That is why I suggested that a streaming mode should have
protocol timeouts and also a status report time.

> , and some, when they do close, don’t close properly.
>
Ok. But if we still have a status time, they are going to be reported
eventually, right?

> Then of course there are the flows that have activity after they close
> which is somewhat frustrating from a flow activity sensors perspective.
>
If they are technically closed, I imagine they may be considered as new
flows?

>   Holding a flow cache is expensive, computationally, so what is the point
> of tracking flows that never terminate ????
>
I agree. The flows I want to track do terminate, but after a long time.
Anyway, I only track them now for 3600 s and they I report them. I'm ok
with that.

> Of course, long lived flows can be stateful, such as TCP or stateless,
> such as IPSec tunnels.  There is no completion indications for an IPSec
> tunnel, they can last for months/years at a time, and then they just
> timeout.  Some TCP connections don’t close, sometimes because the end
> system faults, powers off or reboots, or the network eats the closing
> packets, as is seen in some poor stateful NAT’ing equipment.
>
Yes. Do you think that having a status time for these cases is going
against the streaming idea?

> So what do you want to do about flows that don't close, in your proposal
> ???
>
For the streaming racluster, I would put a status report time. The idea of
the streaming is not "not to have a status time", but instead to "report
the finished flows as soon as possible" and then report the rest when the
status time timeouts.

> Hold onto their caches forever ????  That gets really expensive really
> fast.
>
Agree. we should not cache them forever.

> And of course your algorithm is subject to manipulation by intruders that
> know that your sensing is state based.
>
This is true and a very important limitation of most detection methods.
Intruders are going to adapt to almost any algorithm if they can. That is
why our detection algorithm of C&C servers is based on two features: first,
the periodicity of the patterns and second the similarity between
connections. The first one means that for intruders to evade it, they
should stop the periodicity and then it is _possible_ that they lost some
synchronicity, which is important for their simultaneous operations such as
DDoS. And the second means that if all the bots do not connect similarly,
then again they may lost the synchronicity. Of course they can bypass this
also with time. And for this the only answer is: we as defenders should
keep adapting and changing our strategy also.

They will start what are called half-open connections, and your algorithm
> will just run out of memory, after a few billion packets.
>
Also true. I'm avoiding it with the 3600s status report. Without going to
that extreme, a normal TCP connection can be open for days without any
problem, which should also be reported at some point.

> If you would like for me to add state conditions for flushing records in
> racluster.1, I’ll put that in argus-clients-3.0.9.
>
Tell me if I'm wrong in this:
When argus reports the flows (for example every 5 secs), it send the flows
to a client every "status time". But the flows themselves could have
finished already.
That is why some flows have a duration of only 1.5 seconds. However, we
don't see them after 5 seconds are pass.
So the proposal of a streaming racluster would be to report the flows, if
we can, as soon as they finish. And if the flows did not finish, we report
them on the status time like now.
That would allow us to have:
- A status report time that is catching all the strange non-ending flows.
- A quick print of the flows that are finished.

Regards and thanks for your time Carter
Sebas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150615/b8a085af/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: histo2geodo-between0and20secs.png
Type: image/png
Size: 21545 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150615/b8a085af/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: histo1geodo-between-0-3000secs.png
Type: image/png
Size: 22719 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150615/b8a085af/attachment-0001.png>