Processing speed of ra utilities

Tue Sep 2 09:29:05 EDT 2003

Hey Geoff,
   All ra* programs are stream processors, at least from the
perspective of the academic relational database community,
and they have to sequentially work on each datum in order to
do their thing.  I'm not sure what you mean by 'suppress' the
data, but in order to not process the data sequentially and
get things like total bytes, etc ... you have to aggregate
the data, and then you have to worry about what identifiers
you want to keep, which ones you can throw away etc.....

ragator() is our example of ra aggregation, and so you
definitely should give it a try first, as it will do what
you indicated in your first mail.

If you do start writing your own ra* clients, don't hesitate
to send mail!!!!!!

Carter

> -----Original Message-----
> From: Geoff Powell [mailto:geoff at lanrex.net.au]
> Sent: Monday, September 01, 2003 11:33 PM
> To: Carter Bullard
> Cc: argus-info at lists.andrew.cmu.edu
> Subject: RE: Processing speed of ra utilities
>
>
> G'day Carter
>
> I think I understand what you are saying about only doing one
> thing per
> pass of the argus flows. If I understand correctly, it would be ideal
> if racount suppressed the data before counting, and if it allowed
> the user to specify multiple nets (but then I guess memory
> usage becomes
> an issue)
>
> Perhaps even a ragator utility that was able to supress a complete
> data file (which could be written to disk), then different racount
> commands could be used on that file.
>
> I've done a bit of c programming before, I'll have a look at
> the existing
> ra utility source code and see if I can make sense of it.
>
> Thanks for the info
>
> Regards,
> Geoff
>
> On Mon, 1 Sep 2003, Carter Bullard wrote:
>
> > Hey Geoff,
> >    Looking at your samples, ragator() can definitely
> > do some bulk processing for you.  With a simple
> > ragator.conf such as:
> >
> >  Flow  100 ip  *   *   *   *    *     200  0   0
> >  Model 200 ip  255.255.255.0  0.0.0.0   no no no
> >
> > you can generate stats for all the source class-c nets.
> >
> >  Flow  100 ip  *   *   *   *    *     200  0   0
> >  Model 200 ip  0.0.0.0 255.255.255.0    no no no
> >
> > will get you all the dst class-c net stats, in one
> > pass of the data.  If you want the data sorted by
> > network, just pipe the output through rasort(),
> > and with our first example ragator.conf file, you won't
> > be interested in the dst addr, so give this a try:
> >
> > ragator -f ragator.conf -r large-file.out -w - | \
> >      rasort -M saddr -s -dir -s -daddr
> >
> >    All the ra* programs process argus data files
> > sequentially, reading each record and doing whatever
> > processing they are designed to perform, and yes
> > one performance bottle neck with the simple samples
> > provided in the argus-clients distribution is that
> > they really only do one thing for each pass
> > of the data.
> >
> >    The ra* programs are really intended as examples
> > and if you want to speed things up, you should
> > write your own ra* program to process the data in
> > the most efficient way.
> >
> > Carter
> >
> >
> > > -----Original Message-----
> > > From: owner-argus-info at lists.andrew.cmu.edu
> > > [mailto:owner-argus-info at lists.andrew.cmu.edu] On Behalf Of
> > > Geoff Powell
> > > Sent: Monday, September 01, 2003 9:06 PM
> > > To: argus-info at lists.andrew.cmu.edu
> > > Subject: Processing speed of ra utilities
> > >
> > >
> > > Hi all,
> > >
> > > I'm using scripts to do a lot of similar processes on the
> > > same argus data
> > > file (which is quite large), and I'm wondering if anyone
> > > knows of a way I
> > > can speed up the process, and reduce the time it takes for ra
> > > utilities
> > > to produce results.
> > >
> > > Some examples of the commands I'm doing:
> > > racount -n -r large-file.out - src net c.class.ip.1/32
> > > racount -n -r large-file.out - dst net c.class.ip.2/32
> > > racount -n -r large-file.out - src net c.class.ip.3/32
> > > ...all the way to 254/32.
> > >
> > > After that I might look at specific ports/ip protocols for
> > > each IP address
> > > in the c class.
> > >
> > > I'm guessing racount has to process each transaction?
> When the argus
> > > data file size is 50Mb+, even though the computer doing the
> > > processing is
> > > reasonably fast (Dual Xeon 1.5Ghz with 2gb of ram), each
> > > racount command
> > > usually takes around 30sec-1min.
> > >
> > > Is there way I can speed up the process, like running
> > > multiple racounts,
> > > using ragator or another application?
> > >
> > > Thanks for any help
> > >
> > > Regards,
> > > Geoff (geoff at lanrex.net.au)
> > >
>
>