Collecting multiple types of information at once

Wed Aug 29 19:39:27 EDT 2012

Hey Martijn,
One thing to consider is the " cont " directive that you can use in racluster.conf.  It may do some of what you are interested in.

Each line in the racluster.conf, which is composed of an argus client filter (similar not not BPF), an aggregation strategy, staus and idle timers, is managed by an ArgusAggregator, which are separate / independant aggregation contexts.  The " cont " directive tells racluster to pass the record to the next line of the configuration ( continue ), thus passing the record to multiple aggregation contexts.

This may not do every thing you want, but it maybe a good starting point for us to get there.

Carter

, Carter Bullard <carter at qosient.com> wrote:

> Hey Martjin,
> Sorry for the delayed response.  There are a number of things you can do
> to parallelize the processing, but we may need to add something for what
> you are proposing.
> 
> radium does read files, with the ' -R <dir> ' or ' -r <file> ' options, but distributing
> data to specific targets, is not straight forward.
> 
> I like to have radium write a stream of data to multicast addresses, using the
> argus-udp transport, and having a lot of clients, on many machines reading
> those multicast addresses.  A bit of staging work needs to be done, but 
> it does work.
> 
> The radium approach that John mentioned could work for realtime replay
> of stored data, using the " -M realtime " option.  This essential causes
> radium to write out data at the realtime rate that it was collected, so
> radium would write out 24hr's worth of data, in 24hr's.  This is good for
> debugging and a lot of simulations and analytics, but probably not useful
> for your problem.
> 
> OK, I suspect that the best thing to do would be to have a ra* program that
> dispatches a single flow record to a set of configured analytics, and when it hits
> EOF, it closes each analytic in order, and you collect the output.  Looks like a
> decent MISD (multiple instruction, single data) parallel problem, which would be
> very useful.
> 
> This is actually really easy to write, given the current state of the client libraries,
> and the threads support that is inherent now in all clients.
> 
> If we can come up with a configuration scheme, which can go into a radium.conf
> file, I can probably put something together really quickly.  Or at least talk about
> how to structure such a thing on the mailing list.
> 
> Interested?
> 
> Carter
> 
> 
> 
> On Aug 28, 2012, at 5:11 AM, Martijn van Oosterhout <kleptog at gmail.com> wrote:
> 
>> Hoi,
>> 
>> We currently have a situation where we'd like to collect multiple bits
>> of information from a file at once, for example:
>> 
>> - total data
>> - top ten hosts by bandwidth
>> - top ten ports by bandwidth
>> 
>> Additionally, we would want to do all of these with different BPF
>> filters (for example, restricted to only one LAN).
>> 
>> Currently you can do all this with racluster and rasort but it
>> requires going through the data files multiple times. We have a
>> sort-of solution which involves one ra, lots of tee processes and
>> running many raclusters in parallel. But this is not scalable. I was
>> wonder if there was a more efficient way.
>> 
>> I've thought of some possibilities, like being able to tag streams
>> based on BPF. Basically, a filter that checks if a record matches a
>> BPF, if so it replicates it with a special marker. Then you'd just
>> include that marker in your racluster key and you get all your answers
>> at once.
>> 
>> Perhaps this is already possible?
>> 
>> Another possibility is an ra tool where you could embed something like
>> a lua script so you could write your own aggregations easier.
>> 
>> Any other ideas?
>> -- 
>> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
>> 
>