What happened to anomaly detection/packet dynamics? Are there clients?

Carter Bullard carter at qosient.com
Mon Jun 4 11:37:47 EDT 2012


Hey Matt,
Oh I don't think you're outside of the argus project.  Many organizations use the rich
data in argus records to drive a lot of stuff, and you're list of interesting things is not a
new list.  I would recommend that you think in terms of generic machine learning
algorithms, if you want to do something contemporary, at least that will get you going
in a direction that adapts and scales to levels of interest.  But that field has gotten
pretty thick (lots of papers),  and Wikipedia could be a good first step for you.

But you don't have to be too sophisticated to do some reasonable things. Multidimensional
ARMAs will give you something, if you're looking for threshold triggered alarms and
alerts.  This has been the bread and butter of Wall Street quant types, at least the
Charles Schwab types, and multidimensional ARMAs are complex enough to make
it fun and worth the time.  But that is just one strategy to consider.  Time series statistics
is an area that I would recommend, strongly.  Short books like Box, Jenkins and Reinsel's
"Time Series Analysis: Forecasting and Control" is good.  Mathematica has some
good time series modules and its very easy to import argus data into them, so if
you've got Mathematica, that is a good weekend of entertainment.

If you're asking my opinion, I would say that you'll get down the path farther with the
notion that its not flow records that are anomalous, but the abstract objects that you
will construct and track, using flow records and other data, that will shift to an anomalous
state.  That may seem like splitting hairs, but I think its a fundamental paradigm
change from what you're writing about.  Thinking that you'll get a flow record and that
you will change the light from green to red, or you will use a single flow record to decide
to launch a nation state counter offensive, is a pretty simple system.   Simple systems
are easy to imagine, but not necessarily simple to realize.  Even in network operations
alarm and alerting systems, it will be the missing flow record that will be the trigger.
Unfortunately most statistical strategies aren't well suited to processing missing data
(unless, of course, you know that is an important property of the system you need to build).

Argus provides a good data model for network traffic analysis, providing a lot of attributes
and metrics, much more than any other network sensor, within a data processing and
transport architecture that provides a lot of flexibility.   

But, argus is really about figuring out a better metric or better way to report on it, and
when you figure that out, you can hack argus to generate what you need.

It really is all about what is it that you want to do.  Hopefully, you'll find argus is a decent
set of free software that can help you.

Carter



On Jun 3, 2012, at 2:24 PM, Matt Brown wrote:

> Thanks for your very detailed reply.  Your examples are much appreciated, and they give tangibility to a (clearly) very complex question: What makes a flow record anomalous?
> 
> I suppose I'm less interested in things so static, and interested in finding general statistical outliers.  With that conclusion, it's clear that a question like this is outside of the scope of the argus project itself, and would lay more with the specifics of statistical analysis.  I suppose I did hope that people (you and others on the list) would be willing to share some things used for statistical analysis of argus data.
> 
> My base example of detection of anomalous flow records would be that of two papers I found in the citations of the anomaly-based IDS wikipedia article: the general idea of strict anomaly detection included in Phrack in 2000, and the use of the Mahalanobis distance to give "anomaly-weight" to flow records published by the CS department at Columbia University.
> 
> The answer to "what makes a flow anomalous?" is the question that I'm posing to the list.  How do others decide what makes a flow record anomalous?  Do you find that statistical analysis is useful?  If so, what statistical analysis?
> 
> Here is the crux of questions I've posed to a few buddies that are professional statisticians and financial quants:
> 
> "I would like to know if you can recommend anything that will help me locate outliers of data.  It is more than likely that I would take shots of the data over different lengths of time for comparison. For example, I would take the set of collected data for a month, week, day, hour, minute, calculate the base "normal" measurement, then progressively note the divergence from this normal measurement within the given time span."
> 
> Again, after pondering this idea, and having this back and fourth, I do realize that the question is actually outside of the scope of this list as it is generally unrelated to what the argus project can do, itself, and more related to outlier detection in a set of data.
> 
> Carter, the questions you pose seem to be attempting to find scope.  The scope I am focusing on is "everything within a period of time."  I suppose this scope lacks pragmatism.
>  
> 
> However, I would love to hear from anyone who has attempted to discover anomalous records using statistical analysis.  Please feel free to Email me directly if you feel it would not be beneficial to the list members or this project.
> 
> 
> Thanks for your time,
> 
> Matt
> 
> 
> On Fri, Jun 1, 2012 at 9:37 AM, Carter Bullard <carter at qosient.com> wrote:
> Hey Matt,
> Anomaly detection is a very large topic, but the principal concept in argus is to provide
> a good number of generic tools, so you can build your own anomaly detectors.  So,
> we use the Unix strategy of small programs (sorting, printing, filtering, translation, conversion,
> aggregation) and piping, native file systems, curses examples, with dbm support etc……
> 
> So what anomaly do you want to detect ?  Do you want to know when a vital resource 
> successfully transfers data to an address that is outside its normal group of machines?
> Do you want to find out once a day, in real-time?
> 
> That's pretty trivial and we have multiple ways of doing it.   Use racluster() to build the
> list of acceptable addresses from an argus archive (behavioral baselining) , then use
> rafilteraddr() with that list, connected to a live argus data feed, to spit out records from
> machines outside the group.  Pipe that to ra() with a decent filter, like " icmp or app bytes
> gt 0 ", and you should get a list of machines outside the acceptable list that got responses
> from your vital asset.
> 
> Or use rasqlinsert() to build a read only database of the historical IP addresses, or
> you can use rasqlinsert() to dynamically insert IP addresses into a table of acceptable IPs.
> Then use a modified rasqlinsert() with the "-M cache" option, to tell you when it would
> want to INSERT a record, which is your alarm / alert that something outside the acceptable
> group tried to touch your machine.  You can do that for IP's, ethernets, combinations of
> both, or whatever.    Of course you would have to put some decent conditionals on this so you
> would get a decent alarm, but its pretty straight forward.
> 
> But anomaly detection is different from analysis.  Anomaly detection is detection of
> a condition outside of a normal state.  Analysis is not so specific.  I have to ask,
> what do you want to analyze?
> 
> Secondary impact of DDoS on end system availability?  Well that is just racluster().
> The presence of DDoS, well that isn't very interesting, but " what was the population
> of IP's that were using this system, 20 minutes prior to the DDoS? "  Well that is 
> rasql() (if you were doing the IP address strategy above) with a time filter, piped
> into racluster() with a decent filter to pick out flows that are transferring data.
> pipe that into racluster() to formulate the /24 CIDR network addresses, and you
> should have a list of remote class C ip addresses that were good, which you can
> blow into your firewall for a little while.
> 
> I suspect that there are hundreds of these things, especially if there are hundreds of sites.
> 
> But the important question to you is,  What do you want to do?
> 
> Many people want the project to tell them what to do, but in this space, IMHO, that
> isn't really successful.  You need to have an idea of what is important to your specific
> site, and then you find tools that can help you get there.
> 
> So, what do you want to do?
> 
> Carter
> 
> On May 31, 2012, at 10:51 AM, Matt Brown wrote:
> 
>> Thanks for replying Carter.
>> 
>> I suppose I can't easily find any strategies.  Meaning, I've read a few papers on anomaly detection algorithms, and would love to use them.  However, I lack the deep mathematically and development skills to implement.
>> 
>> So, I suppose I am actually asking where are the analysis tools?  Is there a repo qosient keeps?
>> 
>> Thanks again for the reply,
>> 
>> Matt Brown
>> 
>> On May 31, 2012 10:13 AM, "Carter Bullard" <carter at qosient.com> wrote:
>> Hey Matt,
>> Most people do their own thing.  We have lots of examples of things to do,
>> scan detection, access policy monitoring, covert channel detection, discovery detection,
>> asset inventory assessments, behavioral baselining, and with events, you have the
>> basic data for user / flow attribution etc……
>> 
>> So I think its happening.  What do you expect to see that you aren't seeing?
>> 
>> Carter
>> 
>> On May 30, 2012, at 8:12 PM, Matt Brown wrote:
>> 
>>> Hello all,
>>> 
>>> After some research, it's quite obvious that argus output can be used as input for anomaly detection.
>>> 
>>> Carter was involved in a presentation at flocon 2012 that mentions a few cases of analysis: http://www.cert.org/flocon/2012/presentations/bullard-gerth-implementing-packet-dynamic-awareness-argus.pdf
>>> 
>>> I also see that argus is mentioned in another presentation at cmu: http://www.andrew.cmu.edu/user/gnychis/imcfp04-nychis-slides.pdf
>>> 
>>> 
>>> What ever happened to this?  Are there any plans to write a client that can perform some simple anomaly or other analysis?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Matt
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120604/d9980b2e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120604/d9980b2e/attachment.bin>


More information about the argus mailing list