racluster and trans
Carter Bullard
carter at qosient.com
Mon Jul 26 16:35:00 EDT 2010
Hey Rafael,
I found a number of bugs as a result of your report, and have fixed all that I found.
These involved rabins() and ragraph(), and so these fixes will affect your effort.
I modified how ragraph() processes "trans". We will now graph the actual
number of Concurrent Transactions per time period, rather than the AVERAGE
of the trans value. We were graphing connections per second, rather than the
total number of transactions during a time interval. This seems more appropriate,
and if there is a problem, please send email.
Please try the latest argus-clients on the server.
http://qosient.com/argus/dev/argus-clients-3.0.3.17.tar.gz
This will fix processing files that have been previously aggregated, but you should
use the "-M dsrs='-agr'" option when the files have been pre-processed as a general
rule.
Thanks!!!!
Carter
On Jul 22, 2010, at 4:58 AM, Rafael Barbosa wrote:
> Hi,
>
> In the mailing list I found a similar problem to what I was observing, but now I see that after "correcting" the flow direction, the test I reported does not make sense. Indeed one of the 'saddr' becomes a 'daddr', so no problem there.
>
> After your explanation regarding the aggregation metrics (N, mean, etc) I understand the problem with aggregating/spliting flows. I will try to take it into account when getting some statistics from the files. Regarding your question:
>
> In your example, looks like you want to count the number of unique flows per srcid every
> 5 seconds?
>
> The test I did in reality was:
> racluster -r test -w test.cluster
> ragraph trans -M 5s -r test.cluster -w test.cluster.png
> ragraph trans -M 5s -r test -w test.png
>
> Comparing the graphs, I see completely different results, so I tried to reproduce the results using rabins (it's easier to send its output to this list): report the number of flows per 5s bin.
>
> The proposed solution still does not reproduce the original results:
>
> rabins -M dsrs="-agr" -m srcid -M hard time 5s -r test/test.cluster -s stime trans
> 14:37:15.000000 62
> 14:37:20.000000 57
> 14:37:25.000000 19
>
> For now I will avoid rabins/racluster in files already aggregated.
>
> Rafael
>
> On Wed, Jul 21, 2010 at 6:04 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Rafael,
> When you think you have a bug, if you can send an argus datafile that demonstrates
> the problem, I can probably determine if it really is a bug and fix it in a short period
> of time.
>
> OK, A few things. All of the ra* aggregators have a mechanism to "correct" the
> direction of a particular flow record. Because you are tracking a direction dependent
> attribute, "saddr", you may be seeing the results of racluster() "correcting" a records
> direction. If you don't want this type of correction, you need to specify that in a
> racluster.conf file, but generally, correcting for direction is a very good thing.
>
> See ./support/Config/racluster.conf and check out the racluster.1 manpage.
> You will want to set this variable.
>
> RACLUSTER_AUTO_CORRECTION=no
>
> Now that doesn't mean there isn't a bug, just means that we have to account for that
> possibility. Look at the actual records that racluster() generates, to see if you
> understand that output, and if there are still problems, then send email.
>
>
> OK, with respect to your racluster->rabins inconsistency. What number do you think
> you are generating? Number of unique flows per srcid per 5 seconds? You will need
> to change the call to rabins() in order to get that number.
>
> All ra* aggregators insert into the records an ARGUS_AGR_DSR information
> element into the records. That is the structure that contains the 'N' (trans), 'mean',
> 'max', 'min', 'stddev' metrics for the aggregation. When you run aggregation twice, the
> next aggregator simply adds to any existing "agr" dsr. This is important for lots of
> reasons, but creates errors for some analytics.
>
> Rabins() while it is an aggregator, it also chops argus records along time lines. In your
> case, if a record spans a 5 second time boundary, rabins() will cut the argus record into
> two records, and it will distribute the metrics, as it can. For packet counts, byte counts,
> it is easy, it distributes the values based on the duration of the record. But there are no
> rules for how to distribute the values in the ARGUS_AGR_DSR. What currently happens
> is we copy the AGR, unmodified, into both records. Based on the type of statistic, this
> is the right thing to do in many cases. However in your case, where you are counting,
> you will get over counting, due to the duplication of numbers for some records.
>
> What I can do, is to modify the 'N' of the ARGUS_AGR_DSR statistic, to distribute the
> number of samples for the statistic. This may fix the inconsistency, and still preserve
> the value of the statistic. However, that will not generate the statistic you are actually
> interested in.
>
> In your example, looks like you want to count the number of unique flows per srcid every
> 5 seconds? You need to remove the "agr" dsr for the input data of your call to rabins().
>
> rabins -M dsrs="-agr" -m srcid -M hard time 5s -r test.cluster -s stime trans
>
> That should get you the metric you're after.
>
> Carter
>
>
>
> On Jul 21, 2010, at 11:04 AM, Rafael Barbosa wrote:
>
>> Hi,
>>
>> I have been having some problem with inconsistent ouptut from ragraph ploting Trans. I get different graphs comparing the results from "original" from the ones reduced with racluster.
>>
>> I dug a bit and a found this old bug that might be related(http://thread.gmane.org/gmane.network.argus/6686/focus=6741):
>>
>> Second, it seems racluster isn't adding up the trans field correctly, here is an example
>>
>> ra -r file.argus -s saddr trans
>> 27.8.77.166 1
>> 27.8.77.166 1
>> 18.9.27.219 1
>> 18.9.27.219 1
>> 18.86.96.147 1
>> 18.86.96.147 1
>> 19.32.203.136 1
>> 19.32.203.136 1
>>
>> racluster -r file.argus -m saddr -s saddr trans
>> 19.32.203.136 4
>> 18.86.96.147 3
>> 18.9.27.219 4
>> 27.8.77.166 3
>>
>> This is what I get when I run something similar in one of my files:
>>
>> ra -r file.argus -s saddr trans | sort
>> 10.16.4.11 1
>> 10.16.4.12 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.21 1
>> 10.16.4.22 1
>> 10.16.4.53 1
>> 10.16.4.53 1
>> 10.16.4.54 1
>> 10.16.4.54 1
>> 10.16.4.55 1
>> 10.16.4.71 1
>> 10.16.4.71 1
>> 10.16.5.249 1
>> racluster -r file.argus -m saddr -s saddr trans | sort
>> 10.16.4.11 1
>> 10.16.4.12 1
>> 10.16.4.21 13
>> 10.16.4.22 1
>> 10.16.4.53 1
>> 10.16.4.54 2
>> 10.16.4.55 1
>> 10.16.4.71 2
>> 10.16.5.249 1
>>
>> The count for 10.16.4.53 should be 2. I think there is a bug in racluster when calculating trans. Here is another weird result:
>> ra -r big.file -N 100 -w test
>> racluster -r test -w test.cluster
>> rabins -m srcid -M hard time 5s -r test -s stime trans
>> 14:37:15.000000 62
>> 14:37:20.000000 72
>> 14:37:25.000000 19
>> rabins -m srcid -M hard time 5s -r test.cluster -s stime trans
>> 14:37:15.000000 81
>> 14:37:20.000000 76
>> 14:37:25.000000 36
>>
>> I get the same result if I use rasplit and later on racluster, instead of rabins.
>>
>> Thanks,
>> Rafael
>
>
>
>
> <test>
Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20100726/b3a6b675/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20100726/b3a6b675/attachment.bin>
More information about the argus
mailing list