racluster request

Thu Oct 26 19:31:56 EDT 2006

> 
> Well, there is no doubt that ragator() was a good name, so we 
> maybe able to keep it going with this little project.  
> 

:)

> OK, so getting some notion as to what we're contemplating.  I 
> think in terms of streams and pipelines.  I can imagine a way 
> to specify to ragator() that it set up many streams, and I 
> can also think about branching streams, where at some point 
> in a pipeline, we decide to split the stream into branches.  
> Once we set up the flow of records, then we can specify where 
> along the stream aggregation should occur, and what the rules 
> should be.  Could this type of system help?
> 

i think so.. trying to determine what you are referring to as a stream
and a pipeline..

it may just be a case of

filter=".." {
	filter=".." model=".."
      filter=".." model=".."
      filter=".." {
            ..
            ..
      }
}

the outer most top level filters should catch things quite loosely to
avoid the amount of checks that have to be performed on each flow....
iptables springs to mind.. speeding up processing of each packet by not
pushing each packet through the entire ruleset like nearly all other
commercial packet filters still do :(

is this still flexible enough to avoid the need for real flow control
directives?

does it ned further control.. with named aggregate tables and 'jump'
rules like iptables which could jump to another table or terminate
processing or let it fall through back (through the stack) to the top
level table and continue on so it can pass multiple aggregate trees
(unlike iptables where you can't have two eventual outcomes for a packet
:))

this probably should allow for tagging of the matched rule somehow so
you know exactly where in that tree the output aggregate is from.. not
entirely unlike the old flow id from old ragator (that wasn't really
ever used) and this need wasn't in old ragator as it did the first match
only approach.

this is important currently with racluster to avoid reruns of data if
you have two disparate network ranges you want to aggregate togheter
(which believe it or not i sometimes want to ;)).. for example..

filter="src net ( 10.0.0.0 mask 255.0.0.0 or 192.168.0.0 mask
255.255.0.0 }" filter="proto dport"
filter="src net ( 172.16.0.0 mask 255.240.0.0 or 1.2.3.0 mask
255.255.255.0 )" filter="proto dport"

you can't determine which aggregate is which because you can't use the
saddr/daddr to determine which one is which.... this would allow less
pre filtering of data :)

this forms my initial filtering stage.. before counting.. if this could
all be done in ragator/racluster with the multiple aggregate..
especially with selectable output files then i can reduce my entire
processing down to a single step ;)

the above is a bit useless tho for dumping to files..

filter="src net (10.0.0.0 mask 255.0.0.0 or 192.168.0.0 mask
255.255.0.0)"

is going to create a lot of flows with saddr/daddr both 0.0.0.0 :(

so.. with the above tree...

filter="net ( 10.0.0.0 mask 255.0.0.0 or 192.168.0.0 mask 255.255.0.0 )"
file="group_a_networks.argus" {
    filter="src net ( 10.0.0.0 mask 255.0.0.0 )" model="saddr/8 proto
dport"
    filter="dst net ( 10.0.0.0 mask 255.0.0.0 )" model="daddr/8 proto
dport"
    filter="net 192.168.0.0 mask 255.255.0.0" file="192.168.0.0.argus" {
        filter="net 192.168.25.0 mask 255.255.255.0" file="192.168.25.0"
{
            filter="dst net 192.168.25.0" model="daddr/24 proto dport"
file="192.168.25_dst_flows.argus"
            filter="src net 192.168.25.0" model="saddr/24 proto dport"
file="192.168.25_src_flows.argus"
        }
    }
}

filter="net ( 10.0.0.0 mask 255.255.255.0 or 1.2.3.0 mask 255.255.255.0
)" file="group_b_networks.argus" {
    .....
}

this gives a very flexible mecahnism.... but no means to terminate
processing of a flow.. which if you were pipelining you wouldn't be able
to do that far down anyway cos it is already in the next stream also as
not filtereed further up.. so i think this would match your pipeline
stream thing?

aggregates are formed whereever there is a 'model' line which may help
to change to 'aggregate' instead of 'model' at this point for clarity in
the tree.. therefore a file= on an aggregate outputs an aggregate.. a
file= on a filter= without a model= outputs the un aggregated matching
data to a file..

this has just merged ra(1) in aswell as racount(1) and radium(1) (sort
of).. into one.. i dunno if that a good thing or not :)

sub flows are handled in the tree so non matching flows don't traverse
all rules (which would be slow)..
and flows with matching attributes can also be grouped multiple times at
any part in the tree..

this also allows data matching a 'filter' at any point in the tree to be
output as it is.. which is also handy..

this would just be a bitch to code :) although the filter handlnig gets
abstracted out and then it just becomes a use for recursion  which is
always good :)

the config file language is also pretty trivial so building a parser for
it should be relatively easy also..