Long time series

Carter Bullard carter at qosient.com
Thu Sep 27 08:08:18 EDT 2012


Hey Rafael,
Not sure what you're trying to accomplish, so hard to make a good recommendation.
How big are your files and how much memory do you have ?

The rabins() call you're doing isn't really doing any real data reduction.  I would
suspect that your average transaction duration is under 1 second, so "binning"
the data, holding it all in memory, before outputting it, just isn't going to do much
for you.  Can you aggregate the data with a data reduction key, instead of the
default?

Can you use rasplit() to create the hourly bins, and then run racluster() to generate
your hourly aggregates?

Because you are probably looking to work with only one metric in your output,
you can throw all the DSRs away, expect for the few that matter, like the time, flow
and metric dsrs.  That will save you a massive amount of memory.

   rabins -M dsrs="flow,time,metric"

what is it that you are actually trying to do with your time series data?  

I have always advocated 5 minute files, and I recommend this to you.  If you need
hourly aggregate data, it can be generated from processed 5 minute files.
As a test, I would recommend that you take one current file, and use rasplit() to
generate 5 minute files, aggregate the 5 minute files with racluster(), and then
run racluster() again on the 5 minute aggregates, to generate a 1 hour aggregate.
The resulting files will give you an indication of how many flows you're dealing
with, and how much memory will be required to do the job.

The -R option works fine for me with rabins().  What kind of behavior are you getting?

Carter

On Sep 26, 2012, at 8:05 AM, Rafael Barbosa <rrbarbosa at gmail.com> wrote:

> Hi all,
> 
> What is the recommended way to generate large time series with rabins?
> 
> Some context. I am running:
> $> rabins -f racluster.conf -M hard time 1h -r ../data/* - some-filter > time-series.txt
> 
> And:
> $> cat racluster.conf
> filter="" status=0 idle=300
> 
> In the 'data' folder I have around 3 months of data and, each file with roughly 40-60min worth of traffic. However I rapidly run out of memory and I can't afford the swapping. Is there a way to do this with argus using less memory? Or should I start generating multiple time-series (eg. 1 per day) and 'stitch' then together afterwards?
> 
> I tried setting -B 5s, but if I seems to have little impact, if any.
> 
> Extra: why does rabins not accept the -R option?
> 
> Rafael Barbosa
> http://www.ewi.utwente.nl/~barbosarr/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120927/25e9ab60/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120927/25e9ab60/attachment.bin>


More information about the argus mailing list