rabins() does not "ramp up" counts !!!

Fri Aug 2 13:46:02 EDT 2013

All set!

It was a misunderstanding of the options that got me.  I had a few
more questions in my last email, but as you indicate, testing myself
will lead me to answers.

Thanks,

Matt

On Aug 1, 2013, at 5:37 PM, Carter Bullard <carter at qosient.com> wrote:

> Hey Matt,
> Shorter emails please.  These tombs are very difficult for others
> to track, and while you personally maybe getting something out of
> this, the list is intended for the group as a whole.
>
> So, make some changes, give it a test drive, try to figure it out.
> Change your status interval, and look at how argus behaves.
> Share the experience, but keep the email short.
>
> If you think that argus needs better documentation, make a
> suggestion, write a paragraph, submit a new man page, contribute
> to the wiki.  Just send your comments to this list.  Its
> gets archived, etc… but do contribute.
>
> If you have a question, ask a specific question.  Don't write 25
> pages, and then toss to me that I need to document how all the
> fields are calculated.  Most of that is already documented, we
> do have tutorials on this stuff, for Pete's sake.
>
> And, please, just for Carter's sake, stop declaring that the
> technology sucks, is inaccurate, or broken, until you actually
> know that it is.  I will never say that argus is great software,
> but it doesn't suck.  There maybe bugs, but argus and its clients
> are not inaccurate.
>
> And, please, try better subject lines.  rabins() is not ramping
> up counts.  Doesn't do it. but there are 7 messages in this
> thread that asks why rabins ramps up counts.
>
> How am I suppose to respond to a subject line like that ?????
>
> With every response you send, you have created and are
> perpetuating the myth that rabins() is ramping up counts.
>
> That isn't true.
>
> Carter
>
>
> On Aug 1, 2013, at 4:39 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>
>> Hey Carter,
>>
>> I appreciate the thorough responses as always.  I was five steps
>> behind you, but I now understand.
>>
>> Is it worth adding documentation (in the man) on how the "calculated
>> fields" are calculated?  I think that would have avoided some of my
>> confusion.
>>
>> I was also confused by how the specific flow I was looking at is a
>> long lived flow that encompasses the bin, and, in fact, the entire
>> time span contained in the source data file; meaning the duration is
>> always 5 seconds regardless if I'm using `-M hard` or not.
>>
>>
>> Here is a summary of what you described (and correct me if I'm wrong):
>>
>> `-M hard`:
>> - Modifies a flow's ltime and stime so that their duration is set to
>> that of the bin timespan.
>>
>> `-M nomodify`:
>> - doesn't modify anything about the records.  Specifically, doesn't
>> modify time-wise fields.
>> Q: On the NSM wiki, it's stated that `-M nomodify` is used when you
>> wish to have a flow only appear in a single bin.  Is this correct?
>> Would it appear in the single bin at its stime?
>>
>>
>>
>> Given the following two points:
>> 1) rate=pkts/(ltime - stime)
>> 2) dur=ltime-stime
>> ...the reason why `-M hard` affects "ramps up" and "ramps down" `rate`:
>> - Since the `dur` of each record is forced to be equal to the bin timespan,
>> -- in this case, the `pkts` sent within the first and last designated
>> bin timespan are only for a small section of the time within the bin
>> timespan (`dur`)...
>> --- therefore, The `rate` calculation result is an abnormally small
>> count, because the `dur` is the not the length of the real `dur` of
>> the shorter lived flow.
>>
>> Q: Wouldn't this indicate that you can not use a field like `rate`
>> (what I called an aggregate field, but what you called a "calculated
>> field") with `-M hard` and expect the first report to be reliable?
>> Q: How can I effectively use `-M hard`?
>>
>> 1) `-B` must always be larger than ARGUS_FLOW_STATUS_INTERVAL since
>> `-B` is a buffer time that waits for all data to arrive for a bin.
>> 2) `-B` should be used to set rabins() to hang out while all the
>> status records arrive for a bin (from the argus cache ?).  The status
>> records are delivered at the interval set with
>> argus.conf:ARGUS_FLOW_STATUS_INTERVAL.
>>
>> Q: I think, by what you've stated, when used with a `-B` timespan
>> value that is less than ARGUS_FLOW_STATUS_INTERVAL, rabins() would
>> report nO data, but it clearly reports data.  Are you try to say that
>> the data is unpredictable and therefore can't be reliable upon?
>>
>>
>> Clearly, my problem was a mix of misusing `-M hard` and `-B`.
>> Q: I will be lowering my ARGUS_FLOW_STATUS_INTERVAL from 60 to 5 (as
>> reportedly acceptable as the default).  What is the cost of this?
>>
>>
>> Thanks again,
>>
>> Matt
>>
>>
>>
>> On Jul 31, 2013, at 11:23 PM, Carter Bullard <carter at qosient.com> wrote:
>>
>>> Matt,
>>>
>>> The reason you are having problems is the " -B 5s " option.
>>>
>>> Don't use it when reading data from a file.  Its not intended
>>> for this use, and while it won't hurt anything, it represents
>>> a lack of understanding of the option.
>>>
>>> When you use it with a live argus data source, it must be
>>> greater than the ARGUS_FLOW_STATUS_INTERVAL to have its effect.
>>>
>>> Lets try to understand what this option is doing.  The " -B secs "
>>> option is saying how long you have to hold a bin in memory
>>> in order to guarantee that all the data arrives for that time
>>> bin.
>>>
>>> With these options " -M time 5s -B 5s ", you will process
>>> only 2 bins at a time, the current bin, who's range is
>>> [now - (now + 5s)] and the hold buffer, which is [(now - 5s) - now].
>>>
>>> With an ARGUS_FLOW_STATUS_INTERVAL=60s, you will receive data
>>> whose time range could be [(now - 60s) - now].  rabins(), when
>>> carving up the record into 5 second chunks, will generate, at
>>> most 12 records whose ranges are:
>>> 1. [((now - 60s)+0*5s) - ((now - 60s)+1*5s)]
>>> 2. [((now - 60s)+1*5s) - ((now - 60s)+2*5s)]
>>> 3. [((now - 60s)+2*5s) - ((now - 60s)+3*5s)]
>>> …
>>> 12. [((now - 60s)+11*5s) - ((now - 60s)+12*5s)]
>>>
>>> With a " -B 5s " option, only the 12th record will have a slot
>>> to aggregate into.  So you will be throwing away 11/12 of
>>> all your data.
>>>
>>> Increase your "-B sec" option to
>>> (max(ARGUS_FLOW_STATUS_INTERVAL) + someDelay)
>>>
>>> So I would use at a minimum " -B 65s ".
>>>
>>>
>>> OK, lets try to keep the email short in the future,
>>> if its feasible to do so.  One topic at a time…
>>>
>>> Carter
>>>
>>>
>>>
>>> On Jul 30, 2013, at 4:19 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>>
>>>> Does rabins() "ramp up to normal" over N bins?
>>>>
>>>>
>>>> I'd like to start working on calculating moving averages to help
>>>> identify performance outliers (like "spikes" in `loss` or `rate`).
>>>>
>>>> For this purpose, I believe grabbing data from the output of rabins()
>>>> would serve me well.
>>>>
>>>>
>>>> For example, if I take historic argus data and run it through the
>>>> following rabins() invocation, I see some odd things that can only be
>>>> noted as "ramping up":
>>>>
>>>>
>>>> for f in $(ls -m1 ~/working/*) ; do (
>>>> rabins -M hard time 5s -B 5s -r $f -m saddr -s ltime rate - port 5432
>>>> and src host 192.168.10.22
>>>> ) >> ~/aggregated_rate ; done
>>>>
>>>>
>>>> The first few and the last few resulting records per file seem to not
>>>> be reporting correctly.
>>>>
>>>> For example, these dudes at 192.168.10.22 utilize a postgres DB
>>>> replication package called bucardo.  During idle time, bucardo sends
>>>> heartbeat info, and appears to be holding at about 47-49 packets per
>>>> second (rate).
>>>>
>>>> However, I am seeing the following in my rabins() resultant data (note
>>>> the precense of field label header == the start of a new rabins() from
>>>> the above for..loop):
>>>>
>>>> 2013-07-25 00:59:25.000000    47.400000
>>>> 2013-07-25 00:59:30.000000    47.400000
>>>> 2013-07-25 00:59:35.000000    48.000000
>>>> 2013-07-25 00:59:40.000000    48.000000
>>>> 2013-07-25 00:59:45.000000    40.600000
>>>> 2013-07-25 00:59:50.000000    21.400000
>>>> 2013-07-25 00:59:55.000000    15.400000
>>>> 2013-07-25 01:00:00.000000     5.000000
>>>> 2013-07-25 01:00:05.000000     0.000000
>>>>              LastTime         Rate
>>>> 2013-07-25 01:00:05.000000     0.200000
>>>> 2013-07-25 01:00:10.000000     0.600000
>>>> 2013-07-25 01:00:15.000000     0.400000
>>>> 2013-07-25 01:00:35.000000     0.400000
>>>> 2013-07-25 01:00:40.000000     1.000000
>>>> 2013-07-25 01:00:45.000000     6.200000
>>>> 2013-07-25 01:00:50.000000    25.400000
>>>> 2013-07-25 01:00:55.000000    32.400000
>>>> 2013-07-25 01:01:00.000000    41.800000
>>>> 2013-07-25 01:01:05.000000    47.600000
>>>> 2013-07-25 01:01:10.000000    48.600000
>>>>
>>>> [The source files were written with rastream().]
>>>>
>>>>
>>>> It is well worth noting that if I start an rabins() reading from the
>>>> argus() socket with the following invocation, the same sort of thing
>>>> occurs:
>>>> # rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime rate
>>>> - port 5432 and src host 192.168.10.22
>>>>              LastTime         Rate
>>>> 2013-07-30 15:42:55.000000     1.400000
>>>> 2013-07-30 15:43:00.000000     0.600000
>>>> 2013-07-30 15:43:05.000000    33.800000
>>>> 2013-07-30 15:43:10.000000    47.400000
>>>> 2013-07-30 15:43:15.000000    58.600000
>>>> 2013-07-30 15:43:20.000000    87.600000
>>>> 2013-07-30 15:43:25.000000    96.200000
>>>> 2013-07-30 15:43:30.000000    96.000000
>>>> 2013-07-30 15:43:35.000000   134.200000
>>>> 2013-07-30 15:43:40.000000   137.200000
>>>> 2013-07-30 15:43:45.000000   137.400000
>>>> 2013-07-30 15:43:50.000000   136.600000
>>>> 2013-07-30 15:43:55.000000   139.800000
>>>> 2013-07-30 15:44:00.000000   136.200000 <-- `rate` averages about here
>>>> going forward
>>>>
>>>>
>>>> It's irrelevant which field I utilize, the same instance occurs:
>>>> # rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s ltime load
>>>> - port 5432 and src host 192.168.10.22
>>>>              LastTime     Load
>>>> 2013-07-30 15:50:15.000000 1461.19*
>>>> 2013-07-30 15:50:20.000000 42524.7*
>>>> 2013-07-30 15:50:25.000000 54329.5*
>>>> 2013-07-30 15:50:30.000000 55244.8*
>>>> 2013-07-30 15:50:35.000000 90164.8*
>>>> 2013-07-30 15:50:40.000000 92539.1*
>>>> 2013-07-30 15:50:45.000000 94827.1*
>>>> 2013-07-30 15:50:50.000000 95292.7*
>>>> 2013-07-30 15:50:55.000000 96286.3*
>>>> 2013-07-30 15:51:00.000000 94857.6*
>>>> 2013-07-30 15:51:05.000000 130699.*
>>>> 2013-07-30 15:51:10.000000 149979.*
>>>> 2013-07-30 15:51:15.000000 149320.*
>>>> [killed]# rabins -M hard time 5s -B 5s -S 127.0.0.1:561 -m saddr -s
>>>> ltime load - port 5432 and src host 192.168.2.22
>>>>              LastTime     Load
>>>> 2013-07-30 15:52:35.000000 33894.4*
>>>> 2013-07-30 15:52:40.000000 3134.84*
>>>> 2013-07-30 15:52:45.000000 39262.4*
>>>> 2013-07-30 15:52:50.000000 40024.0*
>>>> 2013-07-30 15:52:55.000000 41188.7*
>>>> 2013-07-30 15:53:00.000000 40259.2*
>>>> 2013-07-30 15:53:05.000000 75057.6*
>>>> 2013-07-30 15:53:10.000000 97160.0*
>>>> 2013-07-30 15:53:15.000000 106520.*
>>>> 2013-07-30 15:53:20.000000 138504.*
>>>> 2013-07-30 15:53:25.000000 153835.*
>>>> 2013-07-30 15:53:30.000000 152892.*
>>>> 2013-07-30 15:53:35.000000 154017.* <-- `load` averages here going forward
>>>>
>>>> This happens whether or not I perform field aggregation (`-m saddr`).
>>>>
>>>>
>>>> Why is this happening?
>>>>
>>>>
>>>> This seems like it will really screw up calculating moving averages
>>>> (figuring out spikes, etc.) from the rabins() resultant data.
>>>>
>>>>
>>>> Thanks!
>>>>
>>>> Matt
>