ragraph and unsorted files

Carter Bullard carter at qosient.com
Tue May 3 07:27:32 EDT 2011


Hey Rafael,
Yes, I can put in a warning if data is being rejected,  and I'll figure out how to not reject any data.  

Thanks, and thanks for the data, that makes all the difference in fixing this stuff.

Carter

On May 3, 2011, at 4:36 AM, Rafael Barbosa <rrbarbosa at gmail.com> wrote:

> Hi,
> 
> I simply noticed the 'bug' but did no think about the why's. The problem of generating timeseries with a single pass is clear. Maybe it helps if I explain how I got to the example I've sent.
> 
> I need to generate statistics on a per-flow basis, for that I created a racluster.conf file that would aggregate all records using a 5min timeout interval (filter="" status=0 idle=300). As my data consists of a few fairly large flows (a few days), this problem occurs. The difference in the start time between the 1st and 2nd records in my aggregated file is 4(!) days.
> 
> There are several possible solutions for my specific problem:
> - use the '-t' option to define the range.
> - keep 'status' reports in the files used to generated the time series.
> - use a 2-pass approach, as suggested by you, would also solve the problem, but I think my problem is a rare case, and it's not worth the performance impact.
> 
> The biggest problem in my opinion, is that the user might not be aware of this issue, thus think the generated graph with the unsorted data is correct. So, it would be good to generate a warning message in case a flow with start time sooner than the first bin is found, when generating the time series. What do you think of this?
> 
> Rafael Barbosa
> http://www.vf.utwente.nl/~barbosarr/
> 
> 
> 
> On Mon, May 2, 2011 at 6:47 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Rafael,
> ragraph() is just a front end to rabins(), and so any problems will be caused by rabins().
> 
> I think this is a bug, so I'll take a look, to see what I can do.   rabins() is our time-series
> engine, so it has a lot of bells and whistles in it.  ragraph() doesn't need all the stuff that
> makes rabins() complicated, so it maybe that there is a better strategy.
> 
> The reason there is some complexity to the problem, is that we want the approach
> rabins() uses for bin management to be able to work with both streaming data and
> file based data.  With infinite streaming data, you need to be concerned with memory
> management, so our strategy is to have a "silding window" type of data processing,
> for aggregation etc...   As you suggested, the problem is rabins() is not allowing for
> a large window, when processing files.
> 
> If you were to give ragraph() an explicit time range to graph, this problem
> would go away.
> 
> So, the client library supports the notion of multi-pass processing of files.
> If you look at the source code, all clients have a variable ArgusPassNum, and if
> in your own clients initialization routine, you defined that to be 2, as an example, we
> would process the input file list twice.  I could use that to simply scan the data from the
> file list on the first pass to set the time series start and stop times, and then run the data
> through again to tally the results, but the performance can be pretty bad if I do that
> as a general strategy.  But that would be faster than if we had to sort the data prior to graphing it.
> 
> I'll look to see if this is a bug, or a feature.  How wildly out of order are the records?
> 
> Carter
> 
> 
> On May 2, 2011, at 11:26 AM, Rafael Barbosa wrote:
> 
>> Hi all,
>> 
>> I run into something today that might be considered a bug: ragraph does not handle well files that are not ordered by 'stime'. Basically it seems that ragraph uses the info of the first record to initialize the timeseries, so flows that are before in time (but later in the file) are ignored, or at least erroneously processed.
>> 
>> I upload the file 'ragraph-unsorted.zip' to ftp://qosient.com/incoming that contains an example.
>> 
>> An easy work around is to make sure that the file is ordered, with rasort(), before using ragraph. E.g.:
>> rasort -m stime -r flows.argus -w sorted.argus
>> 
>> Best regards,
>> Rafael Barbosa
>> http://www.vf.utwente.nl/~barbosarr/
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20110503/66e9bb5f/attachment.html>


More information about the argus mailing list