Why is rabins() "ramping up" counts?
Matt Brown
matthewbrown at gmail.com
Wed Jul 31 14:29:20 EDT 2013
Hey Carter...
Thanks for replying quickly.
Hope you're ready for one of my novels...
If not, this message can be summerized into a question:
Should I be "throwing away" any data returned within the first
"ARGUS_FLOW_STATUS_INTERVAL" when using rabins() as it appears to be
inaccurately reported?
--
/etc/argus.conf: ARGUS_FLOW_STATUS_INTERVAL=60
Does 60 seconds qualify in the "very large in comparison to 5 seconds"
category?
--
I definitely have a small number of _flows_ per five second interval for
this specific BPF.
Am I right to assume that rabins() with `-M hard` will take whatever flows
are occurring within each bin and treat it solo, not disgarding it in the
next bins (this is what `-M nomodify` is for right?)?
--
Here is the outcome of what you've described to help me understand rabins():
1) grab the `seq` of the `-N 1` record:
#ra -N 1 -r ~/working/2013-07-25_argus_09\:00\:00 -s seq - port 5432 and
src host 192.168.10.22
Seq
12187458
2) write the single flow record to an argus binary file:
#ra -N 1 -r ~/working/2013-07-25_argus_09\:00\:00 -w - - port 5432 and src
host 192.168.10.22 > ~/temp.argus
3) If I look at a field that is summation (`pkts`) [not an aggregate
itself, like `rate` is], without using field aggregation (`-m`), I get the
TotPkts:
#ra -r ~/temp.argus -s seq ltime pkts - port 5432 and src host 192.168.10.22
Seq LastTime TotPkts
12187458 2013-07-25 09:59:17.698748 59326
4) If I then look at the output of rabins() running against the same `seq`,
It appears that rabins() shows `pkts` within each bin, whose sum IS equal
to the above TotPkts:
#rabins -M hard time 5s -r ~/temp.argus -s seq ltime pkts - port 5432 and
src host 192.168.10.22
...snipped output...
Cool! A summation works with a field that isn't, itself, an aggregate.
[Note the output is the same with or without `-B 5s`]
What about a field that is, itself, an aggregate (`rate`)?
#ra -r ~/temp.argus -s seq ltime rate - port 5432 and src host 192.168.10.22
Seq LastTime Rate
12187458 2013-07-25 09:59:17.698748 16.675105
#rabins -M hard time 5s -B 5s -r ~/temp.argus -s seq ltime rate - port 5432
and src host 192.168.10.22
Cool! If I avg() the sum of the resultant Rates, I get 16.4646067416... so
not exactly correct, but good enough(?). [Note the output is the same with
or without `-B 5s`]
--
The (`-m`) aggregator does not cause the "ramp up"...
proof:
I do not see a difference when using an aggregator (`-m saddr`, because my
BPF considers a single src host) with rabins() if I use an aggregate field
(`rate`) against a live feed:
#timeout 60s rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -s seq ltime
saddr rate - port 5432 and src host 192.168.10.22 > ~/rabins_aggr.out &
timeout 60s rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s seq
ltime saddr rate - port 5432 and src host 192.168.10.22 >
~/rabins_aggr_saddr.out
I can successfully manually add up the output of rabins() without the `-m`
aggregator that would fall into the bins, and they equate to the `-m saddr`
values [+/-~0.5].
So aggregation is not the cause of the "ramp up".
--
"Ramp up" is exhibited on both aggregated fields and non-aggregated fields.
proof:
# rabins -M hard time 5s -B 5s -m saddr -S 127.0.0.1:561 -s seq ltime pkts
- port 5432 and src host 192.168.10.22
Seq LastTime TotPkts
15103267 2013-07-31 13:46:35.000000 41
14983890 2013-07-31 13:46:40.000000 75
14983890 2013-07-31 13:46:45.000000 144
14983890 2013-07-31 13:46:50.000000 255
14983890 2013-07-31 13:46:55.000000 377
14983890 2013-07-31 13:47:00.000000 368
15103267 2013-07-31 13:47:05.000000 373
14983890 2013-07-31 13:47:10.000000 446
14983890 2013-07-31 13:47:15.000000 570
14983890 2013-07-31 13:47:20.000000 567
14983890 2013-07-31 13:47:25.000000 575
14983890 2013-07-31 13:47:30.000000 637
15103267 2013-07-31 13:47:35.000000 647
# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s seq ltime saddr
rate - port 5432 and src host 192.168.10.22 Seq
LastTime SrcAddr Rate
14667433 2013-07-31 13:43:45.000000 192.168.10.22 15.200000
14667433 2013-07-31 13:43:50.000000 192.168.10.22 38.600000
14667433 2013-07-31 13:43:55.000000 192.168.10.22 61.800000
14667433 2013-07-31 13:44:00.000000 192.168.10.22 61.000000
14667433 2013-07-31 13:44:05.000000 192.168.10.22 60.600000
14667433 2013-07-31 13:44:10.000000 192.168.10.22 75.200000
14667433 2013-07-31 13:44:15.000000 192.168.10.22 99.400000
14667433 2013-07-31 13:44:20.000000 192.168.10.22 99.200000
14667433 2013-07-31 13:44:25.000000 192.168.10.22 101.400000
14667433 2013-07-31 13:44:30.000000 192.168.10.22 113.400000
14667433 2013-07-31 13:44:35.000000 192.168.10.22 123.400000
14667433 2013-07-31 13:44:40.000000 192.168.10.22 130.600000
14667433 2013-07-31 13:44:45.000000 192.168.10.22 129.800000
FINAL QUESTION:
Should I simply be "throwing away" any data returned within the first
"ARGUS_FLOW_STATUS_INTERVAL" when using rabins() as it appears to be
inaccurately reporting?
SORRY ONE MORE... :)
Also, does ra() only report flows (`seq`) that have flow records reporting,
while rabins() (with `-M hard`) report all flows that have any activity
within the bin?
Thanks,
Matt
On Jul 30, 2013, at 4:32 PM, Carter Bullard <carter at qosient.com> wrote:
Hey Matt,
Have to see the data that generated the output to know if
there is a problem.
The key here is the ARGUS_FLOW_STATUS_INTERVAL. If it is
very large in comparison to your bin size, and you
have a small number of records, then this kind of
skewing can occur. But have to see the data.
Your rabins() call will cut flow records into 5 second bins,
normally distributing the metrics (pkts, bytes, appbytes, etc…),
and then when its time to output the bins, it will apply the
aggregation strategy to all the flow records that are in
each bin.
Your -B 5s will throw away records that preceed the apparent
start time of the stream, and is only used when reading live data.
Don't use the " -B secs" option when reading files.
That may clear up your problem.
So grab a single flow record's status records, writing them out to a file.
Then run rabins() to see how it carves up the flow record.
You should see that it processes well.
Carter
On Jul 30, 2013, at 4:19 PM, Matt Brown <matthewbrown at gmail.com> wrote:
Hello all,
Does rabins() "ramp up to normal" over N bins?
I'd like to start working on calculating moving averages to help
identify performance outliers (like "spikes" in `loss` or `rate`).
For this purpose, I believe grabbing data from the output of rabins()
would serve me well.
For example, if I take historic argus data and run it through the
following rabins() invocation, I see some odd things that can only be
noted as "ramping up":
for f in $(ls -m1 ~/working/*) ; do (
rabins -M hard time 5s -B 5s -r $f -m saddr -s ltime rate - port 5432
and src host 192.168.10.22
) >> ~/aggregated_rate ; done
The first few and the last few resulting records per file seem to not
be reporting correctly.
For example, these dudes at 192.168.10.22 utilize a postgres DB
replication package called bucardo. During idle time, bucardo sends
heartbeat info, and appears to be holding at about 47-49 packets per
second (rate).
However, I am seeing the following in my rabins() resultant data (note
the precense of field label header == the start of a new rabins() from
the above for..loop):
2013-07-25 00:59:25.000000 47.400000
2013-07-25 00:59:30.000000 47.400000
2013-07-25 00:59:35.000000 48.000000
2013-07-25 00:59:40.000000 48.000000
2013-07-25 00:59:45.000000 40.600000
2013-07-25 00:59:50.000000 21.400000
2013-07-25 00:59:55.000000 15.400000
2013-07-25 01:00:00.000000 5.000000
2013-07-25 01:00:05.000000 0.000000
LastTime Rate
2013-07-25 01:00:05.000000 0.200000
2013-07-25 01:00:10.000000 0.600000
2013-07-25 01:00:15.000000 0.400000
2013-07-25 01:00:35.000000 0.400000
2013-07-25 01:00:40.000000 1.000000
2013-07-25 01:00:45.000000 6.200000
2013-07-25 01:00:50.000000 25.400000
2013-07-25 01:00:55.000000 32.400000
2013-07-25 01:01:00.000000 41.800000
2013-07-25 01:01:05.000000 47.600000
2013-07-25 01:01:10.000000 48.600000
[The source files were written with rastream().]
It is well worth noting that if I start an rabins() reading from the
argus() socket with the following invocation, the same sort of thing
occurs:
# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime rate
- port 5432 and src host 192.168.10.22
LastTime Rate
2013-07-30 15:42:55.000000 1.400000
2013-07-30 15:43:00.000000 0.600000
2013-07-30 15:43:05.000000 33.800000
2013-07-30 15:43:10.000000 47.400000
2013-07-30 15:43:15.000000 58.600000
2013-07-30 15:43:20.000000 87.600000
2013-07-30 15:43:25.000000 96.200000
2013-07-30 15:43:30.000000 96.000000
2013-07-30 15:43:35.000000 134.200000
2013-07-30 15:43:40.000000 137.200000
2013-07-30 15:43:45.000000 137.400000
2013-07-30 15:43:50.000000 136.600000
2013-07-30 15:43:55.000000 139.800000
2013-07-30 15:44:00.000000 136.200000 <-- `rate` averages about here
going forward
It's irrelevant which field I utilize, the same instance occurs:
# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime load
- port 5432 and src host 192.168.10.22
LastTime Load
2013-07-30 15:50:15.000000 1461.19*
2013-07-30 15:50:20.000000 42524.7*
2013-07-30 15:50:25.000000 54329.5*
2013-07-30 15:50:30.000000 55244.8*
2013-07-30 15:50:35.000000 90164.8*
2013-07-30 15:50:40.000000 92539.1*
2013-07-30 15:50:45.000000 94827.1*
2013-07-30 15:50:50.000000 95292.7*
2013-07-30 15:50:55.000000 96286.3*
2013-07-30 15:51:00.000000 94857.6*
2013-07-30 15:51:05.000000 130699.*
2013-07-30 15:51:10.000000 149979.*
2013-07-30 15:51:15.000000 149320.*
[killed]# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s
ltime load - port 5432 and src host 192.168.2.22
LastTime Load
2013-07-30 15:52:35.000000 33894.4*
2013-07-30 15:52:40.000000 3134.84*
2013-07-30 15:52:45.000000 39262.4*
2013-07-30 15:52:50.000000 40024.0*
2013-07-30 15:52:55.000000 41188.7*
2013-07-30 15:53:00.000000 40259.2*
2013-07-30 15:53:05.000000 75057.6*
2013-07-30 15:53:10.000000 97160.0*
2013-07-30 15:53:15.000000 106520.*
2013-07-30 15:53:20.000000 138504.*
2013-07-30 15:53:25.000000 153835.*
2013-07-30 15:53:30.000000 152892.*
2013-07-30 15:53:35.000000 154017.* <-- `load` averages here going forward
This happens whether or not I perform field aggregation (`-m saddr`).
Why is this happening?
This seems like it will really screw up calculating moving averages
(figuring out spikes, etc.) from the rabins() resultant data.
Thanks!
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130731/8deabbe6/attachment.html>
More information about the argus
mailing list