argus archive support in argus-3.0, rastream()
Carter Bullard
carter at qosient.com
Tue Sep 18 20:26:47 EDT 2007
Gentle people,
I'm working up the general documentation and archive support in
argus-3.0 is a big topic. I'd like to get some feedback on the
features and
make important changes if needed.
We have two fundamental program families for creating and accessing
data in an archive, these are rastream() and radium().
rastream() is, from an academic perspective, a stream block processor.
It provides block oriented semantics to an infinite stream of data.
While
rastream() is overly complicated, as it provides features for
buffering/delaying
data, splitting data, sorting it, and regenerating "framed" streams,
its has
what I've come to understand as the minimum feature set to generate
a working archive from an infinite stream of argus data.
There are two design features that make rastream() the program of
choice.
First, it takes in a partially time sorted stream of argus records,
as you might
see from any number of argi, and outputs a time sorted set of argus
data.
The second feature is that it spawns processes against argus data
files that
it closes. The end result is that rastream() takes in a stream of data
and leaves behind files that are framed in time and processed by
whatever
scripts you provide.
The concept is pretty simple. It functions like rasplit.1, splitting
data based
on time, into a well structured filesystem. It needs to wait some
period of time
after a time boundary passes, and then it closes the file and runs a
script against it.
So lets say we're time splitting data on the hour, so we receive a
record from a
remote argus that starts at 12:00:01. If the input stream was time
sorted, we
would know that we could close the 11-12 file and process the file
to our hearts
content, as no more records are going to be added to the file. But
if the data is not
time sorted, the next record could have started at 11:59:59, and
should be
included in the file. Argus does not guarantee time ordered output,
but its not
bad.
With argus, the amount of "out-of-order" is well understood. The
'wobble'
can't be any more than the ARGUS_FAR_STATUS_INTERVAL.
This means that if you have an ArgusFarStatusInterval of 5 seconds, you
know that the argus records will be out-of-order only within any 5
second window.
So for rastream() to work, it needs to hold the data somewhere between
2-3x the interval, so 10-15 seconds. Just to be sure I choose, 15s.
rastream -B 15s
So rastream() will hold records for 15 seconds, and then split the data
into the outputs. This allows rastream() to provide a sorted output of
records, AND, now the output records can be the trigger for closing
and processing the files. If we get to the 12:00:01 record, we now know
that the 11-12 file is done.
rastream() uses the "-f filename" option to specify the script that
it will run
against the file that it closes. rastream() will call the script
with this command
line "-r /full/path/name/to/the/file", so your scripts need to parse
the "-r" option,
get the filename, and then do whatever. Here is a sample script that
compresses the file. If you have your own programs to process data,
then
replace or add them as you wish.
rastream -S argus -B 15s -w /archive/$srcid//%Y/%m/%d/ntam.%Y.%m.%d.%
H.%M.%S -f /usr/local/bin/rastreamshell
The contents of the script are:
----------begin /usr/local/bin/rastreamshell----------
#!/bin/sh
#
# Argus Client Software. Tools to read, analyze and manage Gargoyle
data.
# Copyright (C) 2000-2007 QoSient, LLC.
# All Rights Reserved
#
# Script called by rastream, to process files.
#
# Since this is being called from rastream(), it will have only a single
# parameter, filename,
#
# Carter Bullard <carter at qosient.com>
#
PATH="/usr/local/bin:$PATH"; export PATH
package="argus-clients"
version="3.0.0"
OPTIONS="$*"
FILES=
DATE=`date`;
while test $# != 0
do
case "$1" in
-r) shift; FILES="$1"; break;;
esac
shift
done
gzip $FILES >> /tmp/ntais.out 2>&1
exit 0
----------end /usr/local/bin/rastreamshell----------
So, at the end of it all, rastream() connects to something, probably
a local radium()
process, and splits data into an archive filesystem, and processes
the data in place.
With a script that periodically removes data that is no longer
needed, you now have
a real archive that you can use.
Well that should do it for now. Any comments?
Carter
More information about the argus
mailing list