rabins() does not "ramp up" counts !!!

Carter Bullard carter at qosient.com
Thu Aug 1 17:37:26 EDT 2013


Hey Matt,
Shorter emails please.  These tombs are very difficult for others
to track, and while you personally maybe getting something out of
this, the list is intended for the group as a whole.

So, make some changes, give it a test drive, try to figure it out.
Change your status interval, and look at how argus behaves.
Share the experience, but keep the email short.

If you think that argus needs better documentation, make a
suggestion, write a paragraph, submit a new man page, contribute
to the wiki.  Just send your comments to this list.  Its
gets archived, etc… but do contribute.

If you have a question, ask a specific question.  Don't write 25
pages, and then toss to me that I need to document how all the
fields are calculated.  Most of that is already documented, we
do have tutorials on this stuff, for Pete's sake.

And, please, just for Carter's sake, stop declaring that the
technology sucks, is inaccurate, or broken, until you actually
know that it is.  I will never say that argus is great software,
but it doesn't suck.  There maybe bugs, but argus and its clients
are not inaccurate.

And, please, try better subject lines.  rabins() is not ramping
up counts.  Doesn't do it. but there are 7 messages in this
thread that asks why rabins ramps up counts.

How am I suppose to respond to a subject line like that ?????

With every response you send, you have created and are
perpetuating the myth that rabins() is ramping up counts.

That isn't true.

Carter


On Aug 1, 2013, at 4:39 PM, Matt Brown <matthewbrown at gmail.com> wrote:

> Hey Carter,
> 
> I appreciate the thorough responses as always.  I was five steps
> behind you, but I now understand.
> 
> Is it worth adding documentation (in the man) on how the "calculated
> fields" are calculated?  I think that would have avoided some of my
> confusion.
> 
> I was also confused by how the specific flow I was looking at is a
> long lived flow that encompasses the bin, and, in fact, the entire
> time span contained in the source data file; meaning the duration is
> always 5 seconds regardless if I'm using `-M hard` or not.
> 
> 
> Here is a summary of what you described (and correct me if I'm wrong):
> 
> `-M hard`:
> - Modifies a flow's ltime and stime so that their duration is set to
> that of the bin timespan.
> 
> `-M nomodify`:
> - doesn't modify anything about the records.  Specifically, doesn't
> modify time-wise fields.
> Q: On the NSM wiki, it's stated that `-M nomodify` is used when you
> wish to have a flow only appear in a single bin.  Is this correct?
> Would it appear in the single bin at its stime?
> 
> 
> 
> Given the following two points:
> 1) rate=pkts/(ltime - stime)
> 2) dur=ltime-stime
> ...the reason why `-M hard` affects "ramps up" and "ramps down" `rate`:
> - Since the `dur` of each record is forced to be equal to the bin timespan,
> -- in this case, the `pkts` sent within the first and last designated
> bin timespan are only for a small section of the time within the bin
> timespan (`dur`)...
> --- therefore, The `rate` calculation result is an abnormally small
> count, because the `dur` is the not the length of the real `dur` of
> the shorter lived flow.
> 
> Q: Wouldn't this indicate that you can not use a field like `rate`
> (what I called an aggregate field, but what you called a "calculated
> field") with `-M hard` and expect the first report to be reliable?
> Q: How can I effectively use `-M hard`?
> 
> 1) `-B` must always be larger than ARGUS_FLOW_STATUS_INTERVAL since
> `-B` is a buffer time that waits for all data to arrive for a bin.
> 2) `-B` should be used to set rabins() to hang out while all the
> status records arrive for a bin (from the argus cache ?).  The status
> records are delivered at the interval set with
> argus.conf:ARGUS_FLOW_STATUS_INTERVAL.
> 
> Q: I think, by what you've stated, when used with a `-B` timespan
> value that is less than ARGUS_FLOW_STATUS_INTERVAL, rabins() would
> report nO data, but it clearly reports data.  Are you try to say that
> the data is unpredictable and therefore can't be reliable upon?
> 
> 
> Clearly, my problem was a mix of misusing `-M hard` and `-B`.
> Q: I will be lowering my ARGUS_FLOW_STATUS_INTERVAL from 60 to 5 (as
> reportedly acceptable as the default).  What is the cost of this?
> 
> 
> Thanks again,
> 
> Matt
> 
> 
> 
> On Jul 31, 2013, at 11:23 PM, Carter Bullard <carter at qosient.com> wrote:
> 
>> Matt,
>> 
>> The reason you are having problems is the " -B 5s " option.
>> 
>> Don't use it when reading data from a file.  Its not intended
>> for this use, and while it won't hurt anything, it represents
>> a lack of understanding of the option.
>> 
>> When you use it with a live argus data source, it must be
>> greater than the ARGUS_FLOW_STATUS_INTERVAL to have its effect.
>> 
>> Lets try to understand what this option is doing.  The " -B secs "
>> option is saying how long you have to hold a bin in memory
>> in order to guarantee that all the data arrives for that time
>> bin.
>> 
>> With these options " -M time 5s -B 5s ", you will process
>> only 2 bins at a time, the current bin, who's range is
>> [now - (now + 5s)] and the hold buffer, which is [(now - 5s) - now].
>> 
>> With an ARGUS_FLOW_STATUS_INTERVAL=60s, you will receive data
>> whose time range could be [(now - 60s) - now].  rabins(), when
>> carving up the record into 5 second chunks, will generate, at
>> most 12 records whose ranges are:
>>  1. [((now - 60s)+0*5s) - ((now - 60s)+1*5s)]
>>  2. [((now - 60s)+1*5s) - ((now - 60s)+2*5s)]
>>  3. [((now - 60s)+2*5s) - ((now - 60s)+3*5s)]
>>>> 12. [((now - 60s)+11*5s) - ((now - 60s)+12*5s)]
>> 
>> With a " -B 5s " option, only the 12th record will have a slot
>> to aggregate into.  So you will be throwing away 11/12 of
>> all your data.
>> 
>> Increase your "-B sec" option to
>>  (max(ARGUS_FLOW_STATUS_INTERVAL) + someDelay)
>> 
>> So I would use at a minimum " -B 65s ".
>> 
>> 
>> OK, lets try to keep the email short in the future,
>> if its feasible to do so.  One topic at a time…
>> 
>> Carter
>> 
>> 
>> 
>> On Jul 30, 2013, at 4:19 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>> 
>>> Hello all,
>>> 
>>> 
>>> Does rabins() "ramp up to normal" over N bins?
>>> 
>>> 
>>> I'd like to start working on calculating moving averages to help
>>> identify performance outliers (like "spikes" in `loss` or `rate`).
>>> 
>>> For this purpose, I believe grabbing data from the output of rabins()
>>> would serve me well.
>>> 
>>> 
>>> For example, if I take historic argus data and run it through the
>>> following rabins() invocation, I see some odd things that can only be
>>> noted as "ramping up":
>>> 
>>> 
>>> for f in $(ls -m1 ~/working/*) ; do (
>>> rabins -M hard time 5s -B 5s -r $f -m saddr -s ltime rate - port 5432
>>> and src host 192.168.10.22
>>> ) >> ~/aggregated_rate ; done
>>> 
>>> 
>>> The first few and the last few resulting records per file seem to not
>>> be reporting correctly.
>>> 
>>> For example, these dudes at 192.168.10.22 utilize a postgres DB
>>> replication package called bucardo.  During idle time, bucardo sends
>>> heartbeat info, and appears to be holding at about 47-49 packets per
>>> second (rate).
>>> 
>>> However, I am seeing the following in my rabins() resultant data (note
>>> the precense of field label header == the start of a new rabins() from
>>> the above for..loop):
>>> 
>>> 2013-07-25 00:59:25.000000    47.400000
>>> 2013-07-25 00:59:30.000000    47.400000
>>> 2013-07-25 00:59:35.000000    48.000000
>>> 2013-07-25 00:59:40.000000    48.000000
>>> 2013-07-25 00:59:45.000000    40.600000
>>> 2013-07-25 00:59:50.000000    21.400000
>>> 2013-07-25 00:59:55.000000    15.400000
>>> 2013-07-25 01:00:00.000000     5.000000
>>> 2013-07-25 01:00:05.000000     0.000000
>>>               LastTime         Rate
>>> 2013-07-25 01:00:05.000000     0.200000
>>> 2013-07-25 01:00:10.000000     0.600000
>>> 2013-07-25 01:00:15.000000     0.400000
>>> 2013-07-25 01:00:35.000000     0.400000
>>> 2013-07-25 01:00:40.000000     1.000000
>>> 2013-07-25 01:00:45.000000     6.200000
>>> 2013-07-25 01:00:50.000000    25.400000
>>> 2013-07-25 01:00:55.000000    32.400000
>>> 2013-07-25 01:01:00.000000    41.800000
>>> 2013-07-25 01:01:05.000000    47.600000
>>> 2013-07-25 01:01:10.000000    48.600000
>>> 
>>> [The source files were written with rastream().]
>>> 
>>> 
>>> It is well worth noting that if I start an rabins() reading from the
>>> argus() socket with the following invocation, the same sort of thing
>>> occurs:
>>> # rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime rate
>>> - port 5432 and src host 192.168.10.22
>>>               LastTime         Rate
>>> 2013-07-30 15:42:55.000000     1.400000
>>> 2013-07-30 15:43:00.000000     0.600000
>>> 2013-07-30 15:43:05.000000    33.800000
>>> 2013-07-30 15:43:10.000000    47.400000
>>> 2013-07-30 15:43:15.000000    58.600000
>>> 2013-07-30 15:43:20.000000    87.600000
>>> 2013-07-30 15:43:25.000000    96.200000
>>> 2013-07-30 15:43:30.000000    96.000000
>>> 2013-07-30 15:43:35.000000   134.200000
>>> 2013-07-30 15:43:40.000000   137.200000
>>> 2013-07-30 15:43:45.000000   137.400000
>>> 2013-07-30 15:43:50.000000   136.600000
>>> 2013-07-30 15:43:55.000000   139.800000
>>> 2013-07-30 15:44:00.000000   136.200000 <-- `rate` averages about here
>>> going forward
>>> 
>>> 
>>> It's irrelevant which field I utilize, the same instance occurs:
>>> # rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime load
>>> - port 5432 and src host 192.168.10.22
>>>               LastTime     Load
>>> 2013-07-30 15:50:15.000000 1461.19*
>>> 2013-07-30 15:50:20.000000 42524.7*
>>> 2013-07-30 15:50:25.000000 54329.5*
>>> 2013-07-30 15:50:30.000000 55244.8*
>>> 2013-07-30 15:50:35.000000 90164.8*
>>> 2013-07-30 15:50:40.000000 92539.1*
>>> 2013-07-30 15:50:45.000000 94827.1*
>>> 2013-07-30 15:50:50.000000 95292.7*
>>> 2013-07-30 15:50:55.000000 96286.3*
>>> 2013-07-30 15:51:00.000000 94857.6*
>>> 2013-07-30 15:51:05.000000 130699.*
>>> 2013-07-30 15:51:10.000000 149979.*
>>> 2013-07-30 15:51:15.000000 149320.*
>>> [killed]# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s
>>> ltime load - port 5432 and src host 192.168.2.22
>>>               LastTime     Load
>>> 2013-07-30 15:52:35.000000 33894.4*
>>> 2013-07-30 15:52:40.000000 3134.84*
>>> 2013-07-30 15:52:45.000000 39262.4*
>>> 2013-07-30 15:52:50.000000 40024.0*
>>> 2013-07-30 15:52:55.000000 41188.7*
>>> 2013-07-30 15:53:00.000000 40259.2*
>>> 2013-07-30 15:53:05.000000 75057.6*
>>> 2013-07-30 15:53:10.000000 97160.0*
>>> 2013-07-30 15:53:15.000000 106520.*
>>> 2013-07-30 15:53:20.000000 138504.*
>>> 2013-07-30 15:53:25.000000 153835.*
>>> 2013-07-30 15:53:30.000000 152892.*
>>> 2013-07-30 15:53:35.000000 154017.* <-- `load` averages here going forward
>>> 
>>> This happens whether or not I perform field aggregation (`-m saddr`).
>>> 
>>> 
>>> Why is this happening?
>>> 
>>> 
>>> This seems like it will really screw up calculating moving averages
>>> (figuring out spikes, etc.) from the rabins() resultant data.
>>> 
>>> 
>>> Thanks!
>>> 
>>> Matt
>> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130801/d31f0c16/attachment.bin>


More information about the argus mailing list