Integrating argus data mining into an IDS

Tue Jul 2 18:48:50 EDT 2013

Carter,

I've been trying to "open source" whatever I'm working on and started doing
this already here:
http://mbrownnyc.wordpress.com/technology-solutions/anomaly-detection-in-argus-data/

I've written some other write ups in the past that we're fairly well
received by their communities including one on implementing redmine and
rhodecode (
http://mbrownnyc.wordpress.com/technology-solutions/rhodecode-and-redmine/),
and another on implementing a full icinga setup (
http://mbrownnyc.wordpress.com/technology-solutions/reliability-monitoring-solution/
).

I also started in on a ridiculously messy project with a simple title
"securing your network," which should take me the next 20-30 years to
finish:
http://mbrownnyc.wordpress.com/technology-solutions/securing-your-network/

I've also posted a few other things about argus in the past (most I'll need
to re-write as my *nix and argus knowledge has improved a bit):
http://mbrownnyc.wordpress.com/tag/argus/

So for sure... free, open, and hopefully well documented!

So... where do we go from here?  Just about anywhere?  What other metrics
are good to be considered and what methods should be used to consider them?
 I'm not sure where else to go other than grinding away at getting the
previously mentioned questions answered.

Thanks,

Matt

On Tue, Jul 2, 2013 at 4:37 PM, Carter Bullard <carter at qosient.com> wrote:

> Hey Matt,
> Yep, this is pretty much a traditional approach to building your own
> anomaly detector.
> Argus data has been purposely designed to support this type of use, so,
> ...., good start.
> I think the difference between you and me is that I don't think the word "
> Intrusion "
> should be the first descriptor of your system.
>
> A good solution involves an event life-cycle that cares about the results
> of
> all the tests that you want to make.  While you can list a number of tests
> for
> comparison, there are others that will be more useful as you develop
> your process.
>
> Not sure that you have anything that would generate a negative comment yet.
> Hard part is developing the supervised and unsupervised baseline behavioral
> metrics in order to make the comparisons.  The statistics are not
> difficult, but
> assuring zero false negatives is where much of the work is done.  The false
> positives are not that hard to deal with, at least that has been my
> experience.
>
> So, what do you want to do?  Do you want to build anomaly detection systems
> with argus data to make money ?  That is one thing.   Want to do it for
> free ?
> That is another.
>
> Carter
>
>
> On Jul 2, 2013, at 2:43 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>
> Hello all,
>
>
> I just got finished (finally) reading through what the APT1 thread, and
> threads that I associate as being offshoots.  If you haven't read them, I
> spent a short amount of time compiling them into an easy to read PDF of 38
> pages available:
> http://mbrownnyc.files.wordpress.com/2013/06/carter_takes_on_mandiants_apt1_v2.pdf
>
>
>
>
> I came out of it very much wanting to integrating argus data into an IDS.
>
>
> Hopefully, this puts my last few mail threads into context.
>
>
>
>
> I have a small set of questions that I believe would be useful to this
> system.
>
>
> 1) Has this saddr+daddr pair been seen before?
> If yes, what is "the nature" of the previous traffic:
> 2) What protocol?
> 3) What time of the day (stime, ltime)? How many flows? Are these
> outliers?**
> 4) How many sbytes and dbytes? Are these outliers?
> 5) What is the abratio? (consumer versus producer) is this an outlier? [as
> with reference to
> http://thread.gmane.org/gmane.network.argus/9397/focus=9400]
> 6) What are the flow durations (dur, mean, stddev)? Is this an outlier?
> 7) What are the packet sizes (pkts)? is this an outlier?
> 8) What is the country code of destination and source? is this an outlier?
>
>
> ** outlier == using scipy.stats.mstats.mquantiles() permille value...
> likely to be compared to other mquantiles() permille for similar traffic,
> over different periods of time (10 seconds, 1 minute, 10 minutes, 60
> minutes, 4 hours, 8 hours, a day, week, month).  I've already thrown
> together something quickly:https://gist.github.com/mbrownnycnyc/5860853
>
>
>
>
> Each answer would receive a weight...
> For instance, a new saddr+daddr pair might receive a low weight of 100,
> while an access to China that has never been seen before, or an upload of
> 100MB might receive a higher weight of 300.
> When the weight of the flow rises over a given threshold, the flow is
> flagged for investigation.
>
>
> I am also planning on other external points, such as:
> - "is there a Snort/Bro/Suricata alert that correlates?"
> - "when was the last anti-malware alert?"
> - "historically, what is the 'risk' of this node over N timespan (where
> 'risk' = how much weight has this node accrued historically)?"
>
>
>
>
> Are the above metrics/questions valuable?
>
>
> What other questions should I be asking?
>
>
>
>
> I'm preped for negative feedback (and I expect quite a bit, because I'm
> flying by the seat of my pants), so please, fire away :)
>
>
>
>
> Thanks,
>
>
> Matt
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130702/ce226d93/attachment.html>