Why is rabins() "ramping up" counts?

Matt Brown matthewbrown at gmail.com
Tue Jul 30 16:19:41 EDT 2013


Hello all,


Does rabins() "ramp up to normal" over N bins?


I'd like to start working on calculating moving averages to help
identify performance outliers (like "spikes" in `loss` or `rate`).

For this purpose, I believe grabbing data from the output of rabins()
would serve me well.


For example, if I take historic argus data and run it through the
following rabins() invocation, I see some odd things that can only be
noted as "ramping up":


for f in $(ls -m1 ~/working/*) ; do (
rabins -M hard time 5s -B 5s -r $f -m saddr -s ltime rate - port 5432
and src host 192.168.10.22
) >> ~/aggregated_rate ; done


The first few and the last few resulting records per file seem to not
be reporting correctly.

For example, these dudes at 192.168.10.22 utilize a postgres DB
replication package called bucardo.  During idle time, bucardo sends
heartbeat info, and appears to be holding at about 47-49 packets per
second (rate).

However, I am seeing the following in my rabins() resultant data (note
the precense of field label header == the start of a new rabins() from
the above for..loop):

2013-07-25 00:59:25.000000    47.400000
2013-07-25 00:59:30.000000    47.400000
2013-07-25 00:59:35.000000    48.000000
2013-07-25 00:59:40.000000    48.000000
2013-07-25 00:59:45.000000    40.600000
2013-07-25 00:59:50.000000    21.400000
2013-07-25 00:59:55.000000    15.400000
2013-07-25 01:00:00.000000     5.000000
2013-07-25 01:00:05.000000     0.000000
                 LastTime         Rate
2013-07-25 01:00:05.000000     0.200000
2013-07-25 01:00:10.000000     0.600000
2013-07-25 01:00:15.000000     0.400000
2013-07-25 01:00:35.000000     0.400000
2013-07-25 01:00:40.000000     1.000000
2013-07-25 01:00:45.000000     6.200000
2013-07-25 01:00:50.000000    25.400000
2013-07-25 01:00:55.000000    32.400000
2013-07-25 01:01:00.000000    41.800000
2013-07-25 01:01:05.000000    47.600000
2013-07-25 01:01:10.000000    48.600000

[The source files were written with rastream().]


It is well worth noting that if I start an rabins() reading from the
argus() socket with the following invocation, the same sort of thing
occurs:
# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime rate
- port 5432 and src host 192.168.10.22
                 LastTime         Rate
2013-07-30 15:42:55.000000     1.400000
2013-07-30 15:43:00.000000     0.600000
2013-07-30 15:43:05.000000    33.800000
2013-07-30 15:43:10.000000    47.400000
2013-07-30 15:43:15.000000    58.600000
2013-07-30 15:43:20.000000    87.600000
2013-07-30 15:43:25.000000    96.200000
2013-07-30 15:43:30.000000    96.000000
2013-07-30 15:43:35.000000   134.200000
2013-07-30 15:43:40.000000   137.200000
2013-07-30 15:43:45.000000   137.400000
2013-07-30 15:43:50.000000   136.600000
2013-07-30 15:43:55.000000   139.800000
2013-07-30 15:44:00.000000   136.200000 <-- `rate` averages about here
going forward


It's irrelevant which field I utilize, the same instance occurs:
# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime load
- port 5432 and src host 192.168.10.22
                 LastTime     Load
2013-07-30 15:50:15.000000 1461.19*
2013-07-30 15:50:20.000000 42524.7*
2013-07-30 15:50:25.000000 54329.5*
2013-07-30 15:50:30.000000 55244.8*
2013-07-30 15:50:35.000000 90164.8*
2013-07-30 15:50:40.000000 92539.1*
2013-07-30 15:50:45.000000 94827.1*
2013-07-30 15:50:50.000000 95292.7*
2013-07-30 15:50:55.000000 96286.3*
2013-07-30 15:51:00.000000 94857.6*
2013-07-30 15:51:05.000000 130699.*
2013-07-30 15:51:10.000000 149979.*
2013-07-30 15:51:15.000000 149320.*
[killed]# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s
ltime load - port 5432 and src host 192.168.2.22
                 LastTime     Load
2013-07-30 15:52:35.000000 33894.4*
2013-07-30 15:52:40.000000 3134.84*
2013-07-30 15:52:45.000000 39262.4*
2013-07-30 15:52:50.000000 40024.0*
2013-07-30 15:52:55.000000 41188.7*
2013-07-30 15:53:00.000000 40259.2*
2013-07-30 15:53:05.000000 75057.6*
2013-07-30 15:53:10.000000 97160.0*
2013-07-30 15:53:15.000000 106520.*
2013-07-30 15:53:20.000000 138504.*
2013-07-30 15:53:25.000000 153835.*
2013-07-30 15:53:30.000000 152892.*
2013-07-30 15:53:35.000000 154017.* <-- `load` averages here going forward

This happens whether or not I perform field aggregation (`-m saddr`).


Why is this happening?


This seems like it will really screw up calculating moving averages
(figuring out spikes, etc.) from the rabins() resultant data.


Thanks!

Matt



More information about the argus mailing list