Long time series

Carter Bullard carter at qosient.com
Mon Oct 1 21:51:24 EDT 2012


Hey Rafael,
Regarding the -R option for rabins(), I think the ../dir generates some issues.
If you can provide the full path to the directory, all should work well.  I'll have to
try to figure out what the " ../ " is doing to cause problems.

Carter

On Sep 27, 2012, at 10:53 AM, Rafael Barbosa <rrbarbosa at gmail.com> wrote:

> Hi Carter,
> 
> How big are your files and how much memory do you have ?
> 
> Around 2000 files of ~30Mb each. The traffic load is not high (local traffic only). I made some tests in my local machine with 4Gb, but the server where I will run it now has 32Gb.
> 
> Can you use rasplit() to create the hourly bins, and then run racluster() to generate
> your hourly aggregates?
> 
> Yes, this seems the way to go. For some reason I was afraid of what would happen in the edges of the bins, with the connections that would span over multiple bins. But I think rasplit should handle it just fine. 
> 
> what is it that you are actually trying to do with your time series data?   
> 
> At the moment I am simply looking at connectivity graphs. Like who talks to who and using which service? But I soon I will look a little deeper, including number of packets, bytes, flows. So I wanted the solution to be flexible.
> 
> The -R option works fine for me with rabins().  What kind of behavior are you getting? 
> 
> There is something odd:
> 
> $> rabins -M hard time 1h -R ../data/
> rabins[11151]: 16:50:10.929848 no input files
> $> cd ..
> $> rabins -M hard time 1h -R data/
> <Runs just fine>
> 
> --
> Rafael Barbosa
> http://www.ewi.utwente.nl/~barbosarr/
> 
> 
> 
> On Thu, Sep 27, 2012 at 2:08 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Rafael,
> Not sure what you're trying to accomplish, so hard to make a good recommendation.
> How big are your files and how much memory do you have ?
> 
> The rabins() call you're doing isn't really doing any real data reduction.  I would
> suspect that your average transaction duration is under 1 second, so "binning"
> the data, holding it all in memory, before outputting it, just isn't going to do much
> for you.  Can you aggregate the data with a data reduction key, instead of the
> default?
> 
> Can you use rasplit() to create the hourly bins, and then run racluster() to generate
> your hourly aggregates?
> 
> Because you are probably looking to work with only one metric in your output,
> you can throw all the DSRs away, expect for the few that matter, like the time, flow
> and metric dsrs.  That will save you a massive amount of memory.
> 
>    rabins -M dsrs="flow,time,metric"
> 
> what is it that you are actually trying to do with your time series data?  
> 
> I have always advocated 5 minute files, and I recommend this to you.  If you need
> hourly aggregate data, it can be generated from processed 5 minute files.
> As a test, I would recommend that you take one current file, and use rasplit() to
> generate 5 minute files, aggregate the 5 minute files with racluster(), and then
> run racluster() again on the 5 minute aggregates, to generate a 1 hour aggregate.
> The resulting files will give you an indication of how many flows you're dealing
> with, and how much memory will be required to do the job.
> 
> The -R option works fine for me with rabins().  What kind of behavior are you getting?
> 
> Carter
> 
> On Sep 26, 2012, at 8:05 AM, Rafael Barbosa <rrbarbosa at gmail.com> wrote:
> 
>> Hi all,
>> 
>> What is the recommended way to generate large time series with rabins?
>> 
>> Some context. I am running:
>> $> rabins -f racluster.conf -M hard time 1h -r ../data/* - some-filter > time-series.txt
>> 
>> And:
>> $> cat racluster.conf
>> filter="" status=0 idle=300
>> 
>> In the 'data' folder I have around 3 months of data and, each file with roughly 40-60min worth of traffic. However I rapidly run out of memory and I can't afford the swapping. Is there a way to do this with argus using less memory? Or should I start generating multiple time-series (eg. 1 per day) and 'stitch' then together afterwards?
>> 
>> I tried setting -B 5s, but if I seems to have little impact, if any.
>> 
>> Extra: why does rabins not accept the -R option?
>> 
>> Rafael Barbosa
>> http://www.ewi.utwente.nl/~barbosarr/
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121001/5bd9de24/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121001/5bd9de24/attachment.bin>


More information about the argus mailing list