rasql, rasqlinsert, rasqltimeindex
John Gerth
gerth at graphics.stanford.edu
Tue May 28 16:40:30 EDT 2013
This is a very nice example of how to use the rasql* tools which I'd like to
see completely fleshed out as it reveals your innovative data management approach.
The piece I'm not grokking yet, is how to leverage the rasqltimeindex tables
against a full argus archive. The man page for 'rasql' talks about retrieving
argus records from a DB, but rasqltimeindex is all about indexing records
from argus files which implies that the data is left in the files and not
duplicated in the database (a Very Good Thing). Perhaps I'll understand
if you augment the example to show how to list all the flows for 69.74.153.46
John Gerth gerth at graphics.stanford.edu Gates 378 (650) 725-3273 fax 725-6949
On 5/28/2013 8:10 AM, Carter Bullard wrote:
> Hey Russell,
> Yes, a lot is going on, but I'm sure you'll catch up very quickly !!!!
>
> About merging flows from multiple sensors using radium.
>
> This is a great thing to do, as it consolidates your data into a single stream, so
> 1) a single tool can get a bigger look at what is going on
> 2) you can develop a single repository that is being updated in near realtime
> 3) you can have multiple points to access live data streams (probe and radii)
>
> To make this work, the key is that all your argus data sources MUST have
> unique source id's, so you can discriminate the source of the data, either
> when ingesting the data or when you do your analytics.
>
> I use rastream(), which is working very well for me, to split the single stream
> into 5 minute files, separated by source id, and the rastream script indexes
> the data for time. This establishes and maintains a single repository, which
> can be accessed very quickly.
>
> The radium() that provides geolocation data, and process correlation for
> all my data is on my localhost port 562, so…..
>
> % rastream -S localhost:562 -f /usr/local/bin/rastream.sh -B 10s -M time 5m \
> -w /Volumes/Data/Archive/QoSient/\$srcid/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S -d
>
> The rastream.sh file is:
>
> ----- Begin included file -----
>
> #!/bin/sh
> #
> # Argus Client Software. Tools to read, analyze and manage Argus data.
> # Copyright (C) 2000-2013 QoSient, LLC.
> # All Rights Reserved
> #
> # Script called by rastream, to process files.
> #
> # Since this is being called from rastream(), it will have only a single
> # parameter, filename,
> #
> # Carter Bullard <carter at qosient.com>
> #
>
> PATH="/usr/local/bin:$PATH"; export PATH
> package="argus-clients"
> version="3.0.6"
>
> OPTIONS="$*"
> FILES=
> while test $# != 0
> do
> case "$1" in
> -r) shift; FILES="$1"; break;;
> esac
> shift
> done
>
> rasqltimeindex -r $FILES -w mysql://root@localhost/ratop
> exit 0
>
>
> ----- End included file -----
>
> Very simple, just time index the data in the files.
>
> I have a specific purpose for my repository, to help me identify new
> network assets as they appear, to realize who they are and who they
> have been talking to.
>
> To do this, I have multiple rasqlinsert()'s attached to the same radium()
> that are establishing and maintaining the pre-processed views into the
> database. I have one to track mac address / IP address pairs:
>
> rasqlinsert -d -M time 1d -S localhost:562 -w mysql://root@localhost/ratop/etherHost_%Y_%m_%d \
> -M rmon cache -m srcid smac saddr -s ltime dur srcid smac saddr spkts dpkts sappbytes dappbytes
>
> That is my go to set of data for finding IP addresses. rasqlinsert() will maintain this table,
> in real-time, so any address that pops up is immediately in the table, and the result
> of rasqlinsert()s processing, I end up with a table for every day.
>
> To track the who is talking to who data, I've got another rasqlinsert() doing the IP matrix data:
>
> rasqlinsert -d -M time 1d -S localhost:562 -w mysql://root@localhost/ratop/ipMatrix_%Y_%m_%d \
> -M cache -m srcid matrix -s ltime dur srcid saddr daddr bytes - ip
>
> These two provide all the information I need. I end up with references to every IP address seen
> in the complete argus data, and I can find the data in seconds.
>
> So when I need to look up a specific IP address, some random address in one of my tables
> as an example, I make this call:
>
> % time rasql -t -365d+365d -M time 1d -r mysql://root@localhost/ratop/etherHost_%Y_%m_%d -M sql='saddr="69.74.153.46"'
> LastTime Dur SrcId SrcMac SrcAddr SrcPkts DstPkts SAppBytes DAppBytes
> 2012/06/04.09:04:18.107813 0.000000 207.237.36.98 00:21:a0:ce:0c:d9 69.74.153.46 1 0 20 0
>
> real 0m2.753s
> user 0m0.029s
> sys 0m0.013s
>
>
> 2.7 seconds to scan an entire years set of data from 20 argus data sources. Then if I want to know
> what addresses that address talked to, knowing that it was just in one day, I can use my ipMatrix
> table to search for the list of addresses:
>
> time rasql -t 2012/06/04 -M time 1d -r mysql://root@localhost/ratop/ipMatrix_%Y_%m_%d -M sql="saddr='69.74.153.46'"
> LastTime Dur SrcId SrcAddr DstAddr TotBytes
> 2012/06/04.09:04:18.107813 0.000000 207.237.36.98 69.74.153.46 207.237.190.64 70
>
> real 0m0.110s
> user 0m0.026s
> sys 0m0.008s
>
> Not bad.
>
>
> OK, so I suggest getting a sense of what you want to do, get all the data into a single stream,
> and point a bunch of rasqlinsert()s at it, then write some scripts to can the basic queries to
> get your answers out as fast as you can. That will get you started.
>
> Holler, if there is anything that doesn't work as you expect….
>
> Carter
>
More information about the argus
mailing list