What happened to anomaly detection/packet dynamics? Are there clients?

Matt Brown matthewbrown at gmail.com
Sun Jun 3 14:24:04 EDT 2012


Thanks for your very detailed reply.  Your examples are much appreciated,
and they give tangibility to a (clearly) very complex question: What makes
a flow record anomalous?

I suppose I'm less interested in things so static, and interested in
finding general statistical outliers.  With that conclusion, it's clear
that a question like this is outside of the scope of the argus project
itself, and would lay more with the specifics of statistical analysis.  I
suppose I did hope that people (you and others on the list) would be
willing to share some things used for statistical analysis of argus data.

My base example of detection of anomalous flow records would be that of two
papers I found in the citations of the anomaly-based
IDS<http://en.wikipedia.org/wiki/Anomaly-based_intrusion_detection_system>wikipedia
article: the
general idea <http://artofhacking.com/files/phrack/phrack56/P56-11.TXT> of
strict anomaly detection included in Phrack in 2000, and the use of
the Mahalanobis distance to give
"anomaly-weight"<http://sneakers.cs.columbia.edu/ids/publications/RAID4.PDF>to
flow records published by the CS department at Columbia University.

The answer to "what makes a flow anomalous?" is the question that I'm
posing to the list.  How do others decide what makes a flow record
anomalous?  Do you find that statistical analysis is useful?  If so, what
statistical analysis?

Here is the crux of questions I've posed to a few buddies that are
professional statisticians and financial quants:

"I would like to know if you can recommend anything that will help me
locate outliers of data.  It is more than likely that I would take shots of
the data over different lengths of time for comparison. For example, I
would take the set of collected data for a month, week, day, hour, minute,
calculate the base "normal" measurement, then progressively note the
divergence from this normal measurement within the given time span."

Again, after pondering this idea, and having this back and fourth, I do
realize that the question is actually outside of the scope of this list as
it is generally unrelated to what the argus project can do, itself, and
more related to outlier detection in a set of data.

Carter, the questions you pose seem to be attempting to find scope.  The
scope I am focusing on is "everything within a period of time."  I suppose
this scope lacks pragmatism.


However, I would love to hear from anyone who has attempted to discover
anomalous records using statistical analysis.  Please feel free to Email me
directly if you feel it would not be beneficial to the list members or this
project.


Thanks for your time,

Matt


On Fri, Jun 1, 2012 at 9:37 AM, Carter Bullard <carter at qosient.com> wrote:

> Hey Matt,
> Anomaly detection is a very large topic, but the principal concept in
> argus is to provide
> a good number of generic tools, so you can build your own anomaly
> detectors.  So,
> we use the Unix strategy of small programs (sorting, printing, filtering,
> translation, conversion,
> aggregation) and piping, native file systems, curses examples, with dbm
> support etc……
>
>  So what anomaly do you want to detect ?  Do you want to know when a vital
> resource
> successfully transfers data to an address that is outside its normal group
> of machines?
> Do you want to find out once a day, in real-time?
>
> That's pretty trivial and we have multiple ways of doing it.   Use
> racluster() to build the
> list of acceptable addresses from an argus archive (behavioral baselining)
> , then use
> rafilteraddr() with that list, connected to a live argus data feed, to
> spit out records from
> machines outside the group.  Pipe that to ra() with a decent filter, like
> " icmp or app bytes
> gt 0 ", and you should get a list of machines outside the acceptable list
> that got responses
> from your vital asset.
>
> Or use rasqlinsert() to build a read only database of the historical IP
> addresses, or
> you can use rasqlinsert() to dynamically insert IP addresses into a table
> of acceptable IPs.
> Then use a modified rasqlinsert() with the "-M cache" option, to tell you
> when it would
> want to INSERT a record, which is your alarm / alert that something
> outside the acceptable
> group tried to touch your machine.  You can do that for IP's, ethernets,
> combinations of
> both, or whatever.    Of course you would have to put some decent
> conditionals on this so you
> would get a decent alarm, but its pretty straight forward.
>
> But anomaly detection is different from analysis.  Anomaly detection is
> detection of
> a condition outside of a normal state.  Analysis is not so specific.  I
> have to ask,
> what do you want to analyze?
>
> Secondary impact of DDoS on end system availability?  Well that is just
> racluster().
> The presence of DDoS, well that isn't very interesting, but " what was the
> population
> of IP's that were using this system, 20 minutes prior to the DDoS? "  Well
> that is
> rasql() (if you were doing the IP address strategy above) with a time
> filter, piped
> into racluster() with a decent filter to pick out flows that are
> transferring data.
> pipe that into racluster() to formulate the /24 CIDR network addresses,
> and you
> should have a list of remote class C ip addresses that were good, which
> you can
> blow into your firewall for a little while.
>
> I suspect that there are hundreds of these things, especially if there are
> hundreds of sites.
>
> But the important question to you is,  What do you want to do?
>
> Many people want the project to tell them what to do, but in this space,
> IMHO, that
> isn't really successful.  You need to have an idea of what is important to
> your specific
> site, and then you find tools that can help you get there.
>
> So, what do you want to do?
>
> Carter
>
> On May 31, 2012, at 10:51 AM, Matt Brown wrote:
>
> Thanks for replying Carter.
>
> I suppose I can't easily find any strategies.  Meaning, I've read a few
> papers on anomaly detection algorithms, and would love to use them.
> However, I lack the deep mathematically and development skills to implement.
>
> So, I suppose I am actually asking where are the analysis tools?  Is there
> a repo qosient keeps?
>
> Thanks again for the reply,
>
> Matt Brown
> On May 31, 2012 10:13 AM, "Carter Bullard" <carter at qosient.com> wrote:
>
>> Hey Matt,
>> Most people do their own thing.  We have lots of examples of things to do,
>> scan detection, access policy monitoring, covert channel detection,
>> discovery detection,
>> asset inventory assessments, behavioral baselining, and with events, you
>> have the
>> basic data for user / flow attribution etc……
>>
>> So I think its happening.  What do you expect to see that you aren't
>> seeing?
>>
>> Carter
>>
>> On May 30, 2012, at 8:12 PM, Matt Brown wrote:
>>
>> Hello all,
>>
>> After some research, it's quite obvious that argus output can be used as
>> input for anomaly detection.
>>
>> Carter was involved in a presentation at flocon 2012 that mentions a few
>> cases of analysis:
>> http://www.cert.org/flocon/2012/presentations/bullard-gerth-implementing-packet-dynamic-awareness-argus.pdf
>>
>> I also see that argus is mentioned in another presentation at cmu:
>> http://www.andrew.cmu.edu/user/gnychis/imcfp04-nychis-slides.pdf
>>
>>
>> What ever happened to this?  Are there any plans to write a client that
>> can perform some simple anomaly or other analysis?
>>
>>
>> Thanks,
>>
>>  Matt
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120603/25a2c608/attachment.html>


More information about the argus mailing list