racluster issue

Tue Mar 5 11:03:22 EST 2013

Hey Craig,
You are starting to realize the same issues that caused us to create raspit() and rastream().
Flow records span whatever ARGUS_STATUS_INTERVAL period there is, so without some
record processing, your output from you methods will have irregular start and stop times.

Now, the assumption is you are processing argus records, where argus has a good
configuration, meaning that the ARGUS_FLOW_STATUS_INTERVAL, is reasonable,
like 1-15 seconds.  With this, you should use either rabins() or rastream().

I think you should relax your requirement that rejects an intermediate argus data file.
If you can do that, use rastream(), to output records into a file with the date in its name,
and after a brief wait time after your time boundary passes, have rastream() run a shell
script containing your commands, against that data file.  You can delete the file when
your done, so that you aren't piling up a lot of data.  

You can also use radium to label your traffic so that you don't need to do it yourself in the
scripts.  But lets stay with your example:

OK assume an ARGUS_FLOW_STATUS_INTERVAL = 5 secs

   rastream -M time 5m -B 10s -S data.source -w /tmp/argus.data/argus.%Y.%m.%d.%H.%M.%S -f rastream.sh -d

This will get the data into a file structure that will be useful, and 10 seconds after each 5 min time boundary,
rastream will run the rastream.sh shell script, passing the file as the single parameter.  Use the
./support/Config/rastream.sh as a guide, and in the script have something like:

   racluster -r $1-w - | ralabel -f ralabel.conf -F ralabel.script.conf > /ssd/argus/splunk/racluster.csv

where ralabel.script.conf has all your particulars in it, like comma separated, and the fields.
Not sure what your " -M dsrs="+metric,+agr..." is doing, I would remove that.

This will give you a new /ssd/argus/splunk/racluster.csv 10 seconds after each 5 minute period.
check for last write time, to see that its changed, and the feed it into whatever.

rabins() is being used my most sites to generate periodic ASCII output of aggregated data.
Gloriad does this for their spinning globe. 

   See http://www.gloriad.org/gloriaddrupal/

so in your example, you would have radium() do the labeling, so that you don't have to pipe
anything in your terminal analytic.  This should work

    rabins -S data.source -M time 5m -B 10s -F ralabel.script.conf

rabins() will sti there, and then 10 seconds after each 5 minute period, like 05:00:10, it will write out
all its clustered data, starting with a START MAR and ending with a STOP MAR.  which can be used
to realize that here is the beginning and here is the end of this time period.  So no intermediate files
of any kind.  I dont like this, necessarily, as you hold a lot of data in memory, before writing out the
time period results, creating a bit of a pipeline issue.

So what do you think, which one will you use ?

Carter

On Mar 4, 2013, at 11:05 PM, Craig Merchant <cmerchant at responsys.com> wrote:

> Carter,
>  
> Here’s what I’m trying to do and I may not be going about it the smartest way…  I would like racluster, rabins, or rastream to output a csv file containing five minutes of flow data, aggregated using proto, saddr, daddr, sport, and dport.  That CSV file will be imported into Splunk for analysis every five minutes.  I would prefer for the CSV file to be overwritten each time the argus client outputs five minutes of aggregated flows.  I would also prefer to avoid writing to an argus binary file as an intermediary step.
>  
> The way I’ve been doing it is to set up an entry in the crontab file that looks like:
>  
> 00,05,10,15,20,25,30,35,40,45,55 * * * * /usr/local/bin/racluster -S 10.10.10.10:561 -T 300 -p 3 -u -Z b -w - | /usr/local/bin/ralabel -r - -f /usr/local/argus/ralabel.conf -c "," -M dsrs=+metric,+agr,+psize,+cocode -n -p 3 -u -Z b -s "+0ltime,+1stime,+trans,+dur,+runtime,+mean,+stddev,+sum,+sco,+dco,+pkts,+spkts,+dpkts,+bytes,+sbytes,+dbytes,+load,+sload,+dload,+loss,+sloss,+dloss,+ploss,+sploss,+dploss,+rate,+srate,+drate,+appbytes,+sappbytes,+dappbytes,+label:200" > /ssd/argus/splunk/racluster.csv
>  
> The problem is that when I’m checking the timestamp on the racluster.csv file, it’s always on the 01,06,11,… minute.  So, it looks like even though racluster is set to connect to radium for 300 seconds, it’s writing out the results after < 120 seconds.  I also tried just running the racluster part of the above command on the command-line and it is also writing the results out before the full five minutes has elapsed.
>  
> Is there a smarter way to accomplish my goal?  If not, how can I figure out why racluster isn’t connecting for the full length of time specified in the –T flag?
>  
> Thanks.
>  
> Craig

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130305/047406ff/attachment.bin>