Argus management records: how to control the frequency?

Tue Jul 5 10:44:14 EDT 2016

Hey Richard,
You control the number of management records by controlling the size of the bin.  Bins are reported on a time basis, controlled by the delay specified using the “-B <secs>” option.  Management records are inserted in the data stream to “ frame”  the argus data, so you can know that one bin ended and the next bin is starting.  To reduce the number of records per bin you can make the bins shorter, or you can change the aggregation rule within each bin to collapse more records.

I would toss nprobe … and generate native argus data.  InSight streaming analytics rely on some basic fundamental properties of argus that netflow lacks.  Argus streaming data is guaranteed to be in time order, and to not exceed the flow status interval, and to be generated in a timely fashion (i.e. without queuing delay).  So a receiving analytic can make assumptions as to the nature of its data, and how to handle it.

Netflow is not designed for time sensitive, streaming analytics.  Netflow can generate data where one record has a duration of 0.01 seconds, and the next record  has a duration of days.   Also netflow generates data that is wildly unsorted in time.   You can get data from yesterday, then data from 5 seconds ago, then data from 6 hours earlier, … etc … If you’re trying to provide a snapshot of whats on the wire 10 seconds ago, netflow data is unusable.  

Depending on how you are running rabins.1, you will get different behavior from netflow streams.

If you are not using the “-B <secs>” option properly, which controls the time delay used before reporting the contents of the current bin,  you maybe getting a large multiplier for your data.  Netflow can generate data with huge durations, and be out of order.  Rabins will break those records up, into its bin size, and distribute the stats uniformly into those multiple bins.   Say if nprobe generates a record with a 1 day duration, and rabins is working with 5 second bins, rabins will generate 16800 records, and uniformly distribute the records attributes and metrics into the resulting bins ... some of which could be empty (i.e. zero metrics).  Rabins will normally throw away the records that are before the current time, minus the -B option time.  If that time is large compared to the bin size, you will get huge multiplication of records.

Records out of order cause huge problems for realtime streaming analytics like rabins.1.  If rabins just exported records from a time bin, say that represented the last 5 seconds of flow data, and a record shows up with a start time of 4 hours ago, then rabins.1 will throw the record away, since it can’t go back in time.  It maybe that your previous attempt with v9 records was throwing most of the data away.  Not sure, but a real possibility.

The best way to debug / develop around this type of data is to store a bunch of records, to give yourself a repeatable source of streaming flows, and then use the toolset, to get a feel for how funky the data is.  how much out of order is the data … how long are the durations reported … Once you have a fixed set of data, you can control the variables …

I would run 

   ra -S cisco://whatever:port -w test.file.out

for a while to characterize your flow data generation behavior, then you can run rabins the way that InSight wants to run, so you can see what is really going on.  The “-D 4” or so option will report on bin generation, and whether it wants to throw data away or not.

InSight is an argus data application.  I’m not sure that it will be very gratifying to try to use InSight with netflow data.

Carter

> On Jul 5, 2016, at 12:43 AM, Richard Rothwell via Argus-info <argus-info at lists.andrew.cmu.edu> wrote:
> 
> Hi Carter,
>  
> I have changed from using Netflow 9 to Netflow 5 for the GLORIAD InSight code.
> This is causing problems because the batches of records coming out of rabins are much larger.
>  
> The InSight perl code splits the text stream coming from rabins based on the management stop records.
> Previously this approach produced about 50,000  to 80,000 records per batch depending on the rate coming out of the router.
>  
> With the change to Netflow 5 I am seeing batches that contain around 500,000 records.
> This caused the software to throw an exception since the “lines” were to large.
> I have increased the input buffer to get the software to run, but tis still falls over eventually.
>  
> Is there some way to control how often management records are produced?
>  
> Regards from Richard
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20160705/d860de94/attachment.html>