Why is rabins() "ramping up" counts?
Carter Bullard
carter at qosient.com
Tue Jul 30 16:32:36 EDT 2013
Hey Matt,
Have to see the data that generated the output to know if
there is a problem.
The key here is the ARGUS_FLOW_STATUS_INTERVAL. If it is
very large in comparison to your bin size, and you
have a small number of records, then this kind of
skewing can occur. But have to see the data.
Your rabins() call will cut flow records into 5 second bins,
normally distributing the metrics (pkts, bytes, appbytes, etc…),
and then when its time to output the bins, it will apply the
aggregation strategy to all the flow records that are in
each bin.
Your -B 5s will throw away records that preceed the apparent
start time of the stream, and is only used when reading live data.
Don't use the " -B secs" option when reading files.
That may clear up your problem.
So grab a single flow record's status records, writing them out to a file.
Then run rabins() to see how it carves up the flow record.
You should see that it processes well.
Carter
On Jul 30, 2013, at 4:19 PM, Matt Brown <matthewbrown at gmail.com> wrote:
> Hello all,
>
>
> Does rabins() "ramp up to normal" over N bins?
>
>
> I'd like to start working on calculating moving averages to help
> identify performance outliers (like "spikes" in `loss` or `rate`).
>
> For this purpose, I believe grabbing data from the output of rabins()
> would serve me well.
>
>
> For example, if I take historic argus data and run it through the
> following rabins() invocation, I see some odd things that can only be
> noted as "ramping up":
>
>
> for f in $(ls -m1 ~/working/*) ; do (
> rabins -M hard time 5s -B 5s -r $f -m saddr -s ltime rate - port 5432
> and src host 192.168.10.22
> ) >> ~/aggregated_rate ; done
>
>
> The first few and the last few resulting records per file seem to not
> be reporting correctly.
>
> For example, these dudes at 192.168.10.22 utilize a postgres DB
> replication package called bucardo. During idle time, bucardo sends
> heartbeat info, and appears to be holding at about 47-49 packets per
> second (rate).
>
> However, I am seeing the following in my rabins() resultant data (note
> the precense of field label header == the start of a new rabins() from
> the above for..loop):
>
> 2013-07-25 00:59:25.000000 47.400000
> 2013-07-25 00:59:30.000000 47.400000
> 2013-07-25 00:59:35.000000 48.000000
> 2013-07-25 00:59:40.000000 48.000000
> 2013-07-25 00:59:45.000000 40.600000
> 2013-07-25 00:59:50.000000 21.400000
> 2013-07-25 00:59:55.000000 15.400000
> 2013-07-25 01:00:00.000000 5.000000
> 2013-07-25 01:00:05.000000 0.000000
> LastTime Rate
> 2013-07-25 01:00:05.000000 0.200000
> 2013-07-25 01:00:10.000000 0.600000
> 2013-07-25 01:00:15.000000 0.400000
> 2013-07-25 01:00:35.000000 0.400000
> 2013-07-25 01:00:40.000000 1.000000
> 2013-07-25 01:00:45.000000 6.200000
> 2013-07-25 01:00:50.000000 25.400000
> 2013-07-25 01:00:55.000000 32.400000
> 2013-07-25 01:01:00.000000 41.800000
> 2013-07-25 01:01:05.000000 47.600000
> 2013-07-25 01:01:10.000000 48.600000
>
> [The source files were written with rastream().]
>
>
> It is well worth noting that if I start an rabins() reading from the
> argus() socket with the following invocation, the same sort of thing
> occurs:
> # rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime rate
> - port 5432 and src host 192.168.10.22
> LastTime Rate
> 2013-07-30 15:42:55.000000 1.400000
> 2013-07-30 15:43:00.000000 0.600000
> 2013-07-30 15:43:05.000000 33.800000
> 2013-07-30 15:43:10.000000 47.400000
> 2013-07-30 15:43:15.000000 58.600000
> 2013-07-30 15:43:20.000000 87.600000
> 2013-07-30 15:43:25.000000 96.200000
> 2013-07-30 15:43:30.000000 96.000000
> 2013-07-30 15:43:35.000000 134.200000
> 2013-07-30 15:43:40.000000 137.200000
> 2013-07-30 15:43:45.000000 137.400000
> 2013-07-30 15:43:50.000000 136.600000
> 2013-07-30 15:43:55.000000 139.800000
> 2013-07-30 15:44:00.000000 136.200000 <-- `rate` averages about here
> going forward
>
>
> It's irrelevant which field I utilize, the same instance occurs:
> # rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime load
> - port 5432 and src host 192.168.10.22
> LastTime Load
> 2013-07-30 15:50:15.000000 1461.19*
> 2013-07-30 15:50:20.000000 42524.7*
> 2013-07-30 15:50:25.000000 54329.5*
> 2013-07-30 15:50:30.000000 55244.8*
> 2013-07-30 15:50:35.000000 90164.8*
> 2013-07-30 15:50:40.000000 92539.1*
> 2013-07-30 15:50:45.000000 94827.1*
> 2013-07-30 15:50:50.000000 95292.7*
> 2013-07-30 15:50:55.000000 96286.3*
> 2013-07-30 15:51:00.000000 94857.6*
> 2013-07-30 15:51:05.000000 130699.*
> 2013-07-30 15:51:10.000000 149979.*
> 2013-07-30 15:51:15.000000 149320.*
> [killed]# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s
> ltime load - port 5432 and src host 192.168.2.22
> LastTime Load
> 2013-07-30 15:52:35.000000 33894.4*
> 2013-07-30 15:52:40.000000 3134.84*
> 2013-07-30 15:52:45.000000 39262.4*
> 2013-07-30 15:52:50.000000 40024.0*
> 2013-07-30 15:52:55.000000 41188.7*
> 2013-07-30 15:53:00.000000 40259.2*
> 2013-07-30 15:53:05.000000 75057.6*
> 2013-07-30 15:53:10.000000 97160.0*
> 2013-07-30 15:53:15.000000 106520.*
> 2013-07-30 15:53:20.000000 138504.*
> 2013-07-30 15:53:25.000000 153835.*
> 2013-07-30 15:53:30.000000 152892.*
> 2013-07-30 15:53:35.000000 154017.* <-- `load` averages here going forward
>
> This happens whether or not I perform field aggregation (`-m saddr`).
>
>
> Why is this happening?
>
>
> This seems like it will really screw up calculating moving averages
> (figuring out spikes, etc.) from the rabins() resultant data.
>
>
> Thanks!
>
> Matt
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130730/d1aa4219/attachment.bin>
More information about the argus
mailing list