updates for argus-2.x compatbility and database support

Tue Feb 24 10:32:49 EST 2009

Hey Mark (et al),
All of the work uses MySQL, and I did not do the right thing to make
it such that you could swap out DB's on the fly, but I would like to get
to that level of support eventually.

If I can take the time now to describe part of the database support.

There are 2 primary goals.  The first is to provide a generic set of
tools to insert and retreive argus data from flow data schemas.   The
second is to provide some fast lookups to make using flow data
easier.  The first goal seems to be well served with this release.
The second requires a bit of complexity but it works very well.

For goal #1 we have two tools.  rasqlinsert() maintains insertion of
argus data into the database.  This tool is basically ratop() but
instead of writing data to a curses screen, it writes it to a database
schema.  Using the same syntax you would to create the screen
look and feel (-s option), you specify a database schema.  Using
the same syntax for specifing flow keys (-m option) you specify
a key strategy, and you have the option of embedding binary
argus records into the schema (this is really important) rather
than just ascii text based columns.  This strategy allows us to schedule
writes to the database, and to do a bunch of processing before we
decide to commit a flow record to the database.

You can append to existing tables, generate databases and
tables on the fly (say a daily table, with monthly databases), you
can drop existing tables and recreate them, and you can use
the database table as a flow cache.  I use this feature to hold
my real-time situational awareness data.

I am old school, in that the same tool that produces data, should
also be able to consume the data, and so rasqlinsert() can attach
to the table that another rasqlinsert() is managing, and provide you
with a ratop() look at the contents.

The second tool is rasql(), which simply attaches to the database
and reads the binary argus records that are in the schema.  This
has the exact same syntax as ra(), so it should be familiar.
The big addition is that you have a "-M sql='select where string'"
option, that enables the database to do a lot of work for you.

So the idea is that you use ascii fields to provide argus attribute
'visibility' to the database for doing its magic (sorting, select/joins,
etc...) but you have the full binary argus record in the row, so
that you can get to all the metrics and status fields etc.....

We use this to track all the IP addresses seen in a day, from
say all the border routers.  This table is within 5 seconds of
real-time, so its a useful first view of whats going on (its not the
only one, but its a good start).  We have a simple schema that
has probe, start time, last time, mac address, IP address, pkts in,
pkts out, bytes in, bytes out, aggregate argus record.  The command
is something like this:

    rasqlinsert -S radium -M rmon cache nodrop  -m srcid smac saddr \
         -s stime dur srcid smac saddr pkts bytes -P  
database:tablename -d

This attaches to radium to get its data feed, it tracks singular
"RMON" style objects, its uses the database as a cache, and
doesn't drop the table if it exists and starts up again.  The
database key is "Probe id, Mac Address, IP Address".  rasqlinsert()
is an aggregator, so it will do the right thing to ensure that there
is only one record for each key (very important for databases ;o).

The db schema will consist of the fields specified by the "-s field"  
option,
and these ascii fields will be available to MySQL to do its thing.
You need to make sure that the keys are in the field list, and it will
complain if they are not.  rasqlinsert() knows the correct types and
field widths for each of the 140+ fields we can generate, so there
is some work there.

It will use "database" and "tablename",  creating them if needed,
and it will run as a daemon.  All tables right now are ISAM, and if
there is an explanation, I will support anything that makes sense.

If you run rasqlinsert like this:
    rasqlinsert -P database:tablename

You will get a ratop() screen that feeds from the database table, so
you can see the contents and have a real-time display if you want.

Database host, user and passwords are all stored in the ~/.rarc file,
and can be provided on the command line.  Like the program mysql(),
if you put a "-p" on the command line, it will ask you for a password.

Remember these are all example programs, so you are encouraged
to write your own.

For goal #2, I have a system that indexes a traditional argus
archive (srcid/year/month/day/argus.files) based on seconds,
so that you can fetch primitive data for any arbitrary time
quite quickly.  There are a number of tools here, and a complex
schema that we need, to support things like "where is the archive",
"what format is it using (%Y/%m/file?, %Y/%d/file?,  file?)", if the
data is remote, where should I cache it on my end?  Is the file
compressed?

So, I'll have the code for this in the release, but it will take some
time and dialog before we can all use it intelligently.

OK, if this starts to generate questions, please, fire away.
The more dialog on the email list, the better this will go, I think.

Hope al lis most excellent,

Carter

On Feb 24, 2009, at 8:03 AM, Mark Bartlett wrote:

> Good stuff Carter!!!!  What DB is supported?  Postgres?  MySQL?
>
> Thanks.
>
> mark
>
> On Tue, Feb 24, 2009 at 1:47 AM, Carter Bullard <carter at qosient.com>  
> wrote:
>> Gentle people,
>> I am working on a major release of the clients this week and I should
>> have a package hopefully by Thurs/Fri (if nothing gets in the way).
>>
>> The primary function is to get general bug fixes into the main  
>> release.
>> And backward compatibility was the bug of the week, last week, so I'm
>> working on that.
>>
>> Many "standard" programs will have a number of tweaks to fix bugs  
>> that
>> have come up, that have not hit the mailing list.  While it will be  
>> a lot of
>> changes, , these programs have been stable for quite some time, so  
>> I'm
>> hoping that we won't have a lot of little problems.  Testing will  
>> need to
>> be done, however.
>>
>> rabins(), rasplit() and rastream() have all had a lot of work done to
>> support
>> aggregations units smaller than 1 second.  So that you can specify  
>> bin
>> sizes down to a uSec.   This is important in our high performance  
>> stream
>> analysis work.  Maybe not for everyone, but the code is doing much  
>> better
>> with these changes.
>>
>> And we will have support for flow labeling in radium(), where you can
>> slip ascii metadata into the records to "pump up" the semantics.   
>> This
>> is really cool, and will take some discussions on the list to use  
>> it to the
>> fullest.
>>
>> This major version release of the clients will have a lot of new
>> undocumented
>> programs, but I will try to start describing them on the mailing  
>> list this
>> week.
>> They cover two primary areas, user data analysis and database  
>> support.
>> It maybe possible that I only have one of these ready, but I'm  
>> working on
>> both.
>>
>> The database support causes one major change.  We will need to print
>> "sport" and "dport" values for ICMP flows.  This is guarantee that  
>> all flow
>> records will have a unique flow key, so we won't have trouble  
>> stuffing
>> ICMP flows into an indexed database table of argus records.
>>
>> I seem to be in my office this week, which is a real surprise, so  
>> hopefully
>> I can make some progress.
>>
>> A new release of argus will follow a month later, with support for  
>> packet
>> size and interpacket arrival histogram reporting, as well as a new
>> ArgusEvent feature, where we can collect SNMP, /proc, and lsof() data
>> and send them in the argus data stream.
>>
>> This is primarily to tag flows with the applications that generated  
>> them.
>>
>> Carter Bullard
>> CEO/President
>> QoSient, LLC
>> 150 E 57th Street Suite 12D
>> New York, New York  10022
>>
>> +1 212 588-9133 Phone
>> +1 212 588-9134 Fax
>>
>>
>>
>>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax