Integrating argus data mining into an IDS

Tue Jul 2 14:43:33 EDT 2013

Hello all,

I just got finished (finally) reading through what the APT1 thread, and
threads that I associate as being offshoots.  If you haven't read them, I
spent a short amount of time compiling them into an easy to read PDF of 38
pages available:
http://mbrownnyc.files.wordpress.com/2013/06/carter_takes_on_mandiants_apt1_v2.pdf

I came out of it very much wanting to integrating argus data into an IDS.

Hopefully, this puts my last few mail threads into context.

I have a small set of questions that I believe would be useful to this
system.

1) Has this saddr+daddr pair been seen before?

If yes, what is "the nature" of the previous traffic:

2) What protocol?

3) What time of the day (stime, ltime)? How many flows? Are these
outliers?**

4) How many sbytes and dbytes? Are these outliers?

5) What is the abratio? (consumer versus producer) is this an outlier? [as
with reference to
http://thread.gmane.org/gmane.network.argus/9397/focus=9400]

6) What are the flow durations (dur, mean, stddev)? Is this an outlier?

7) What are the packet sizes (pkts)? is this an outlier?

8) What is the country code of destination and source? is this an outlier?

** outlier == using scipy.stats.mstats.mquantiles() permille value...
likely to be compared to other mquantiles() permille for similar traffic,
over different periods of time (10 seconds, 1 minute, 10 minutes, 60
minutes, 4 hours, 8 hours, a day, week, month).  I've already thrown
together something quickly:https://gist.github.com/mbrownnycnyc/5860853

Each answer would receive a weight...

For instance, a new saddr+daddr pair might receive a low weight of 100,
while an access to China that has never been seen before, or an upload of
100MB might receive a higher weight of 300.

When the weight of the flow rises over a given threshold, the flow is
flagged for investigation.

I am also planning on other external points, such as:

- "is there a Snort/Bro/Suricata alert that correlates?"

- "when was the last anti-malware alert?"

- "historically, what is the 'risk' of this node over N timespan (where
'risk' = how much weight has this node accrued historically)?"

Are the above metrics/questions valuable?

What other questions should I be asking?

I'm preped for negative feedback (and I expect quite a bit, because I'm
flying by the seat of my pants), so please, fire away :)

Thanks,

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130702/a35085ee/attachment.html>