Collecting multiple types of information at once

Wed Aug 29 13:58:01 EDT 2012

Hey Martjin,
Sorry for the delayed response.  There are a number of things you can do
to parallelize the processing, but we may need to add something for what
you are proposing.

radium does read files, with the ' -R <dir> ' or ' -r <file> ' options, but distributing
data to specific targets, is not straight forward.

I like to have radium write a stream of data to multicast addresses, using the
argus-udp transport, and having a lot of clients, on many machines reading
those multicast addresses.  A bit of staging work needs to be done, but 
it does work.

The radium approach that John mentioned could work for realtime replay
of stored data, using the " -M realtime " option.  This essential causes
radium to write out data at the realtime rate that it was collected, so
radium would write out 24hr's worth of data, in 24hr's.  This is good for
debugging and a lot of simulations and analytics, but probably not useful
for your problem.

OK, I suspect that the best thing to do would be to have a ra* program that
dispatches a single flow record to a set of configured analytics, and when it hits
EOF, it closes each analytic in order, and you collect the output.  Looks like a
decent MISD (multiple instruction, single data) parallel problem, which would be
very useful.

This is actually really easy to write, given the current state of the client libraries,
and the threads support that is inherent now in all clients.

If we can come up with a configuration scheme, which can go into a radium.conf
file, I can probably put something together really quickly.  Or at least talk about
how to structure such a thing on the mailing list.

Interested?

Carter

On Aug 28, 2012, at 5:11 AM, Martijn van Oosterhout <kleptog at gmail.com> wrote:

> Hoi,
> 
> We currently have a situation where we'd like to collect multiple bits
> of information from a file at once, for example:
> 
> - total data
> - top ten hosts by bandwidth
> - top ten ports by bandwidth
> 
> Additionally, we would want to do all of these with different BPF
> filters (for example, restricted to only one LAN).
> 
> Currently you can do all this with racluster and rasort but it
> requires going through the data files multiple times. We have a
> sort-of solution which involves one ra, lots of tee processes and
> running many raclusters in parallel. But this is not scalable. I was
> wonder if there was a more efficient way.
> 
> I've thought of some possibilities, like being able to tag streams
> based on BPF. Basically, a filter that checks if a record matches a
> BPF, if so it replicates it with a special marker. Then you'd just
> include that marker in your racluster key and you get all your answers
> at once.
> 
> Perhaps this is already possible?
> 
> Another possibility is an ra tool where you could embed something like
> a lua script so you could write your own aggregations easier.
> 
> Any other ideas?
> -- 
> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120829/10c6ccff/attachment.bin>