argus archive support in argus-3.0, rastream()

Tue Sep 18 20:26:47 EDT 2007

Gentle people,
I'm working up the general documentation and archive support in
argus-3.0 is a big topic.  I'd like to get some feedback on the  
features and
make important changes if needed.

We have two fundamental program families for creating and accessing
data in an archive, these are rastream() and radium().

rastream() is, from an academic perspective, a stream block processor.
It provides block oriented semantics to an infinite stream of data.    
While
rastream() is overly complicated, as it provides features for  
buffering/delaying
data, splitting data, sorting it, and regenerating "framed" streams,  
its has
what I've come to understand as the minimum feature set to generate
a working archive from an infinite stream of argus data.

There are two design features that make rastream() the program of  
choice.
First, it takes in a partially time sorted stream of argus records,  
as you might
see from any number of argi, and outputs a time sorted set of argus  
data.
The second feature is that it spawns processes against argus data  
files that
it closes.  The end result is that rastream() takes in a stream of data
and leaves behind files that are framed in time and processed by  
whatever
scripts you provide.

The concept is pretty simple.  It functions like rasplit.1, splitting  
data based
on time, into a well structured filesystem. It needs to wait some  
period of time
after a time boundary passes, and then it closes the file and runs a  
script against it.

So lets say we're time splitting data on the hour, so we receive a  
record from a
remote argus that starts at 12:00:01.  If the input stream was time  
sorted, we
would know that  we could close the 11-12 file and process the file  
to our hearts
content, as no more records are going to be added to the file.   But  
if the data is not
time sorted, the next record could have started at 11:59:59, and  
should be
included in the file.  Argus does not guarantee time ordered output,  
but its not
bad.

With argus, the amount of "out-of-order" is well understood.  The  
'wobble'
can't be any more than the ARGUS_FAR_STATUS_INTERVAL.
This means that if you have an ArgusFarStatusInterval of 5 seconds, you
know that the argus records will be out-of-order only within any 5  
second window.

So for rastream() to work, it needs to hold the data somewhere between
2-3x the interval, so 10-15 seconds.  Just to be sure I choose, 15s.

    rastream -B 15s

So rastream() will hold records for 15 seconds, and then split the data
into the outputs.  This allows rastream() to provide a sorted output of
records, AND, now the output records can be the trigger for closing
and processing the files.  If we get to the 12:00:01 record, we now know
that the 11-12 file is done.

rastream() uses the "-f filename" option to specify the script that  
it will run
against the file that it closes.  rastream() will call the script  
with this command
line "-r /full/path/name/to/the/file", so your scripts need to parse  
the "-r" option,
get the filename, and then do whatever.  Here is a sample script that
compresses the file.  If you have your own programs to process data,  
then
replace or add them as you wish.

rastream -S argus -B 15s -w /archive/$srcid//%Y/%m/%d/ntam.%Y.%m.%d.% 
H.%M.%S -f /usr/local/bin/rastreamshell

The contents of the script are:
----------begin /usr/local/bin/rastreamshell----------
#!/bin/sh
#
#  Argus Client Software.  Tools to read, analyze and manage Gargoyle  
data.
#  Copyright (C) 2000-2007 QoSient, LLC.
#  All Rights Reserved
#
# Script  called by rastream, to process files.
#
# Since this is being called from rastream(), it will have only a single
# parameter, filename,
#
# Carter Bullard <carter at qosient.com>
#
PATH="/usr/local/bin:$PATH"; export PATH
package="argus-clients"
version="3.0.0"

OPTIONS="$*"
FILES=
DATE=`date`;
while  test $# != 0
do
     case "$1" in
     -r) shift; FILES="$1"; break;;
     esac
     shift
done

gzip $FILES >> /tmp/ntais.out 2>&1
exit 0
----------end /usr/local/bin/rastreamshell----------

So, at the end of it all, rastream() connects to something, probably  
a local radium()
process, and splits data into an archive filesystem, and processes  
the data in place.
With a script that periodically removes data that is no longer  
needed, you now have
a real archive that you can use.

Well that should do it for now.  Any comments?

Carter