racluster issue

Wed Mar 6 12:39:50 EST 2013

Thanks, Carter.

What is the implication for the -d switch on ra clients?  How does running them as a daemon impact how they operate?

Is it also safe to assume that because rastream/rabins is constantly running, I should see less events where they have to guess at the direction of the flow because the handshake happened outside the specified time window?

Thx.

Craig

-----Original Message-----
From: Carter Bullard [mailto:carter at qosient.com] 
Sent: Wednesday, March 06, 2013 5:07 AM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu)
Subject: Re: [ARGUS] racluster issue

Hey Craig,
The -F option feeds the ra* program a custom rarc file.  You can specify most of your command line options in that file, like, what fields to print.  Your command line was just a bit too long for me.

So the problem(s) with a -T approach, is/are that you will miss records between runs, so you lose comprehensive monitoring, and the biggest deal, is that only one ra* program gets a chance at processing the primitive data.  Collect the 5 min time frame, and now you have a chance to do anyhing, including your push into splunk?

Carter

On Mar 5, 2013, at 9:17 PM, Craig Merchant <cmerchant at responsys.com> wrote:

> Hey, Carter...  Thanks for a very thoughtful reply!  The only thing I'm confused about is what "-F ralabel.script.conf" does for either the ralabel or rabins command?
> 
> I'm probably inclined to follow your recommendation and go with the first approach.  Although I'm not clear what benefit it would have compared to the way I'm doing it now (once the -T bug is resolved...).
> 
> Thanks!
> 
> Craig
> 
> 
> -----Original Message-----
> From: Carter Bullard [mailto:carter at qosient.com]
> Sent: Tuesday, March 05, 2013 8:03 AM
> To: Craig Merchant
> Cc: Argus (argus-info at lists.andrew.cmu.edu)
> Subject: Re: [ARGUS] racluster issue
> 
> Hey Craig,
> You are starting to realize the same issues that caused us to create raspit() and rastream().
> Flow records span whatever ARGUS_STATUS_INTERVAL period there is, so 
> without some record processing, your output from you methods will have irregular start and stop times.
> 
> Now, the assumption is you are processing argus records, where argus 
> has a good configuration, meaning that the ARGUS_FLOW_STATUS_INTERVAL, 
> is reasonable, like 1-15 seconds.  With this, you should use either rabins() or rastream().
> 
> I think you should relax your requirement that rejects an intermediate argus data file.
> If you can do that, use rastream(), to output records into a file with 
> the date in its name, and after a brief wait time after your time 
> boundary passes, have rastream() run a shell script containing your 
> commands, against that data file.  You can delete the file when your done, so that you aren't piling up a lot of data.
> 
> You can also use radium to label your traffic so that you don't need 
> to do it yourself in the scripts.  But lets stay with your example:
> 
> OK assume an ARGUS_FLOW_STATUS_INTERVAL = 5 secs
> 
>   rastream -M time 5m -B 10s -S data.source -w 
> /tmp/argus.data/argus.%Y.%m.%d.%H.%M.%S -f rastream.sh -d
> 
> This will get the data into a file structure that will be useful, and 
> 10 seconds after each 5 min time boundary, rastream will run the 
> rastream.sh shell script, passing the file as the single parameter.  Use the ./support/Config/rastream.sh as a guide, and in the script have something like:
> 
>   racluster -r $1-w - | ralabel -f ralabel.conf -F ralabel.script.conf 
> > /ssd/argus/splunk/racluster.csv
> 
> where ralabel.script.conf has all your particulars in it, like comma separated, and the fields.
> Not sure what your " -M dsrs="+metric,+agr..." is doing, I would remove that.
> 
> This will give you a new /ssd/argus/splunk/racluster.csv 10 seconds after each 5 minute period.
> check for last write time, to see that its changed, and the feed it into whatever.
> 
> rabins() is being used my most sites to generate periodic ASCII output of aggregated data.
> Gloriad does this for their spinning globe. 
> 
>   See http://www.gloriad.org/gloriaddrupal/
> 
> so in your example, you would have radium() do the labeling, so that 
> you don't have to pipe anything in your terminal analytic.  This 
> should work
> 
>    rabins -S data.source -M time 5m -B 10s -F ralabel.script.conf
> 
> rabins() will sti there, and then 10 seconds after each 5 minute 
> period, like 05:00:10, it will write out all its clustered data, 
> starting with a START MAR and ending with a STOP MAR.  which can be 
> used to realize that here is the beginning and here is the end of this 
> time period.  So no intermediate files of any kind.  I dont like this, necessarily, as you hold a lot of data in memory, before writing out the time period results, creating a bit of a pipeline issue.
> 
> So what do you think, which one will you use ?
> 
> Carter
> 
> On Mar 4, 2013, at 11:05 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
>> Carter,
>> 
>> Here's what I'm trying to do and I may not be going about it the smartest way...  I would like racluster, rabins, or rastream to output a csv file containing five minutes of flow data, aggregated using proto, saddr, daddr, sport, and dport.  That CSV file will be imported into Splunk for analysis every five minutes.  I would prefer for the CSV file to be overwritten each time the argus client outputs five minutes of aggregated flows.  I would also prefer to avoid writing to an argus binary file as an intermediary step.
>> 
>> The way I've been doing it is to set up an entry in the crontab file that looks like:
>> 
>> 00,05,10,15,20,25,30,35,40,45,55 * * * * /usr/local/bin/racluster -S 
>> 10.10.10.10:561 -T 300 -p 3 -u -Z b -w - | /usr/local/bin/ralabel -r 
>> - -f /usr/local/argus/ralabel.conf -c "," -M 
>> dsrs=+metric,+agr,+psize,+cocode -n -p 3 -u -Z b -s 
>> "+0ltime,+1stime,+trans,+dur,+runtime,+mean,+stddev,+sum,+sco,+dco,+p
>> kts,+spkts,+dpkts,+bytes,+sbytes,+dbytes,+load,+sload,+dload,+loss,+s
>> loss,+dloss,+ploss,+sploss,+dploss,+rate,+srate,+drate,+appbytes,+sap
>> pbytes,+dappbytes,+label:200" > /ssd/argus/splunk/racluster.csv
>> 
>> The problem is that when I'm checking the timestamp on the racluster.csv file, it's always on the 01,06,11,... minute.  So, it looks like even though racluster is set to connect to radium for 300 seconds, it's writing out the results after < 120 seconds.  I also tried just running the racluster part of the above command on the command-line and it is also writing the results out before the full five minutes has elapsed.
>> 
>> Is there a smarter way to accomplish my goal?  If not, how can I figure out why racluster isn't connecting for the full length of time specified in the -T flag?
>> 
>> Thanks.
>> 
>> Craig
> 
>