argus-clients database support

Sun Mar 15 22:53:20 EDT 2009

Gentle people,
I'd like to describe the database support.  This should be a good  
introduction,
and I hope to get a lot of dialogue on this topic before we're done  
with the
complete set of database functions.

There are 2 basic goals in the database support that we have in the
argus-clients-3.0.2 codeset.

    1. Flexible argus record database schema support.
    2. Support for near-realtime access to MySQL based argus data.

For this email, I'll write about #1.

Flexible argus record database schema support

The approach is to provide simple extensions to the basic argus
record processing tools, to allow insertion of argus based data into
MySQL database tables.

There are several strategies that we've implemented in the tools.
The basic idea is to provide database backend storage for the
traditional ra* program set.  The idea is that you shouldn't have to
know what data type is appropriate for a particular argus metric
or attribute, and you shouldn't have to worry about the specifics
of a schema or its key strategy, etc..... just tell the rasql* program
that you're interested in a field (using the "-s" option), with a set
of key fields (using "-m" option), and the rasql* program knows
how to do the right thing creating the table if needed, specifying
the key, and then aggregating the data, if needed, to assure that
the key is unique within the database.

If you want to just load primitive argus data into a database, then
you don't want any keys specified for the data, as you will end up
with lots of records that have the same attributes.  Lots of argus
records would have the same src address, as an example.

If, on the other hand, you want to keep a database of all the IP
addresses your site has ever seen, you will need to process
incoming argus data to extract the IP addresses, and then
UPDATE entries where the address is already in the database
with accumulated metrics.  The rasql* programs are designed
to support both of these types of database strategies.

The design is for you to define your database schema using the
"-s field field field...." option, either on the command line or in
a .rarc file.   Each field specified, will be a column in the resulting
MySQL table schema.  Fields that are "exposed" as attributes in
the database, can be sorted on, selected on, whatever a database
wants to do "on".  So if you expect to select data based on time,
addresses, pkt count, etc... then these fields should be in your
schema.

Keys for the schema are specified using the "-m field ...." option.
If you don't specify a key, the default keys (srcid, saddr, daddr,
proto, sport, dport) are used.  This is consistent with racluster(),
rabins() and ratop().  If you don't want a key, you specify
"-m none", and the table schema will not have any key fields
defined.

If a field is specified as a key, the field has to be specified in the  
"-s"
option. A lot of times you will forget a field, say the "srcid" field,  
but
rasql* programs will complain if key fields are not also in the -s  
options.

rasql* programs support an additional field for the "-s" option which
is always included.  This is the "record" field.  rasql* programs want
to insert the complete binary argus record into the database.
The reason for insertion of the actual binary record is that an argus
record can have as many as 145 metrics and attributes for the 15
basic flow types that argus can generate.  By inserting the binary  
record
itself, we can access all of these metrics, without exposing them as
MySQL attributes in the database schema.  There will be a lot of
discussion about this as time goes on.

When you run a rasqlinsert() program, you will want to tell it if you
want to drop the target table if it already exists, and depending
on whether there are keys or not, whether the database table
will be used as a persistent cache or not.  I'll describe all of this
in the next email.

OK, so what do we have?  There are two basic programs, rasqlinsert(),
which is focused on inserting argus data into MySQL databases, and
rasql(), which is focused on reading the binary argus data that maybe
in MySQL database tables.

rasqlinsert() is basically ratop(), a general purpose argus aggregator,
that writes its output to a MySQL database.  If you haven't used ratop()
now is the time to take it for a test run.  Ratop() supports  
definition of
keys, it allows you to specify printed columns, and thus a database
schema.  As data comes in, ratop() aggregates data and ensures that
only one one entry exists for each key in its schema, printing the
resulting "row'  to its "curses" based output system.  The output  
subsystem
is a "scheduled" system, in that we can control the rate that we write
data to the screen.  All of these features are used to manage data in
a MySQL database, exactly as ratop() manages its output screen!!!!

Because rasqlinsert() is based on ratop(), it also can write its  
output to a curses
screen if its not running as a daemon.  The idea is that you may want to
see what its doing, while you're playing with your schema and your view.
If you know what you're doing, and you just want rasqlinsert() to do  
what
you want, you will definately want to run it as a daemon (using the "-d"
option).

For a few simple examples:

If you want a daemon that writes argus records from an argus data  
source stream
to a 'table' in a 'database' without any modifications:

    rasqlinsert -d -S argus.source -w mysql://user@host/database/table  
-m none

   "-d" run as a daemon.
   "-S" read data continuously from the argus.source.
   "-w" write the data to this mysql URI.
   "-m" specify no keys.

This will create the mysql 'database' if needed, and the 'table' if  
needed.  If the
table already exists, it will drop the table before it starts.
In this case, all the fields are either defined in the ~/.rarc file,  
or they
are the default fields:
       "stime flgs proto saddr sport dir daddr dport pkts bytes state"  
+ "record"

The important option here is the "-m none".  This sez we don't have  
any keys,
so just append records to the specified table.

This example will run presumably forever, adding data to an ever  
growing table.

If this was all the support we provided, you would be able to use  
mysql() to
process the data based on the "exposed" attributes, but you wouldn't  
be able
to do much with the argus "record" BLOB that would be in the database.
Many people will be happy to not include the argus record in the  
database
table.  To do this, simply add this:

      "-s -record"

to the rasqlinsert() commandline.

rasql(), reads the binary BLOB out of the database table, so a simple  
run
would look like this:

    rasql -r mysql://user@localhost/database/table

this will read every record out of every row of the database table.
rasql() supports the "-M sql='select statement'" option.  Assuming
that there is, say, an saddr field, you could write this command:

    rasql -r mysql://user@localhost/database/table -M sql="saddr =  
1.2.3.4"

OK, this email has gotten pretty long, so,....., hopefully it  
generates some
interest and questions.

Hope all is most excellent, and give the new code a try!!!!

Carter