Long time series
Carter Bullard
carter at qosient.com
Mon Oct 1 21:51:24 EDT 2012
Hey Rafael,
Regarding the -R option for rabins(), I think the ../dir generates some issues.
If you can provide the full path to the directory, all should work well. I'll have to
try to figure out what the " ../ " is doing to cause problems.
Carter
On Sep 27, 2012, at 10:53 AM, Rafael Barbosa <rrbarbosa at gmail.com> wrote:
> Hi Carter,
>
> How big are your files and how much memory do you have ?
>
> Around 2000 files of ~30Mb each. The traffic load is not high (local traffic only). I made some tests in my local machine with 4Gb, but the server where I will run it now has 32Gb.
>
> Can you use rasplit() to create the hourly bins, and then run racluster() to generate
> your hourly aggregates?
>
> Yes, this seems the way to go. For some reason I was afraid of what would happen in the edges of the bins, with the connections that would span over multiple bins. But I think rasplit should handle it just fine.
>
> what is it that you are actually trying to do with your time series data?
>
> At the moment I am simply looking at connectivity graphs. Like who talks to who and using which service? But I soon I will look a little deeper, including number of packets, bytes, flows. So I wanted the solution to be flexible.
>
> The -R option works fine for me with rabins(). What kind of behavior are you getting?
>
> There is something odd:
>
> $> rabins -M hard time 1h -R ../data/
> rabins[11151]: 16:50:10.929848 no input files
> $> cd ..
> $> rabins -M hard time 1h -R data/
> <Runs just fine>
>
> --
> Rafael Barbosa
> http://www.ewi.utwente.nl/~barbosarr/
>
>
>
> On Thu, Sep 27, 2012 at 2:08 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Rafael,
> Not sure what you're trying to accomplish, so hard to make a good recommendation.
> How big are your files and how much memory do you have ?
>
> The rabins() call you're doing isn't really doing any real data reduction. I would
> suspect that your average transaction duration is under 1 second, so "binning"
> the data, holding it all in memory, before outputting it, just isn't going to do much
> for you. Can you aggregate the data with a data reduction key, instead of the
> default?
>
> Can you use rasplit() to create the hourly bins, and then run racluster() to generate
> your hourly aggregates?
>
> Because you are probably looking to work with only one metric in your output,
> you can throw all the DSRs away, expect for the few that matter, like the time, flow
> and metric dsrs. That will save you a massive amount of memory.
>
> rabins -M dsrs="flow,time,metric"
>
> what is it that you are actually trying to do with your time series data?
>
> I have always advocated 5 minute files, and I recommend this to you. If you need
> hourly aggregate data, it can be generated from processed 5 minute files.
> As a test, I would recommend that you take one current file, and use rasplit() to
> generate 5 minute files, aggregate the 5 minute files with racluster(), and then
> run racluster() again on the 5 minute aggregates, to generate a 1 hour aggregate.
> The resulting files will give you an indication of how many flows you're dealing
> with, and how much memory will be required to do the job.
>
> The -R option works fine for me with rabins(). What kind of behavior are you getting?
>
> Carter
>
> On Sep 26, 2012, at 8:05 AM, Rafael Barbosa <rrbarbosa at gmail.com> wrote:
>
>> Hi all,
>>
>> What is the recommended way to generate large time series with rabins?
>>
>> Some context. I am running:
>> $> rabins -f racluster.conf -M hard time 1h -r ../data/* - some-filter > time-series.txt
>>
>> And:
>> $> cat racluster.conf
>> filter="" status=0 idle=300
>>
>> In the 'data' folder I have around 3 months of data and, each file with roughly 40-60min worth of traffic. However I rapidly run out of memory and I can't afford the swapping. Is there a way to do this with argus using less memory? Or should I start generating multiple time-series (eg. 1 per day) and 'stitch' then together afterwards?
>>
>> I tried setting -B 5s, but if I seems to have little impact, if any.
>>
>> Extra: why does rabins not accept the -R option?
>>
>> Rafael Barbosa
>> http://www.ewi.utwente.nl/~barbosarr/
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121001/5bd9de24/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121001/5bd9de24/attachment.bin>
More information about the argus
mailing list