new argus database support scripts and methods

Fri Jan 28 15:47:13 EST 2011

Has anybody done any experimentation with non-relational databases?

I mean, all of my own previous experience has been with file-based slices
cause it was always good enough (for what I needed), but if you want to do
richer analyses, perhaps a graphdb or something for hadoop might work
better..

Mostly just curious..
Mark.

-----Original Message-----
From: argus-info-bounces+poepping=cmu.edu at lists.andrew.cmu.edu
[mailto:argus-info-bounces+poepping=cmu.edu at lists.andrew.cmu.edu] On Behalf
Of Carter Bullard
Sent: Friday, January 28, 2011 10:57 AM
To: Argus
Subject: [ARGUS] new argus database support scripts and methods

Gentle people,
I am adding a new ./support/Database directory for scripts that help using
the
MySQL database a bit easier, and have added the first one for the
argus-3.0.4
release.  MySQL.Archive.sh.  It's a simple bash script to move argus data
from
the database to a native file system archive.  The notion is that argus data
is
stored in the database as it arrives, and its held there for some retention
time (days).
After that time, we migrate it to a native file system, and index it for
time.

I migrate my RDBMS based data out of my daily database tables after 30 days,
to a standard native system archive:
   /patht/o/archive/primitive/$srcid/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S

where I retain the data for 1 - 2 years, depending on what is going.

I have this script run by cron every night around 1am.  It uses rasql() to
read
all the data from the specific candidate table, and pipes its output to
rasplit().
After its done, it time indexes the data and then drops the original mysql
table.

Because of the time index, rasql(), when given the right options, can find
the
data, whether its in the database or the native file system.  If data is
inserted
this way:

   rasqlinsert -m none -M time 1d -w mysql://user@host/db/table_%Y_%m_%d

you can find the data this way:

   rasql -M time 1d -r mysql://user@host/db/table_%Y_%m_%d -t time-range

Examples for me, because I have a 30 day RDBMS retention policy, are:

   rasql -M time 1d -r mysql://user@host/db/table_%Y_%m_%d -t -5m
      this will fetch all the records in a date based table for the last 5
minutes

   rasql -M time 1d -r mysql://user@host/db/table_%Y_%m_%d -t -20d+1d
      this will fetch all the records for an entire day, 20 days ago 
      may be a lot of records, and all should come from the RDMBS

   rasql -M time 1d -r mysql://user@host/db/table_%Y_%m_%d -t -1M+2d
      this will fetch all the records for two days, one month ago.
      depending on the month, data may come from the RDMBS and the native
      file system.

When rasql() is used in this way, it knows to look for database table names
first to search for records in a given time range, and then it goes to the
time
index to see if there are records in the native filesystem that match.

So, rasql(), when given the right command line options, can find data
regardless
of archive strategy.  Pretty cool, but I'm sure it has issues, so give all
this a
whirl, and if you have any problems, send emails to the list.

Hope all is most excellent, and thanks for all the support !!!!!!

Carter