racluster and trans

Thu Jul 22 12:34:35 EDT 2010

Hey Rafael,
Thanks for the data file and the description of the problem.
Yes, the problem you describe is very curious.  I'm sorry that
I didn't understand from your earlier emails, how curious the
problem is.

I'm looking at this today, and hopefully will have a solution
for you tonight/tomorrow.

Sorry for any inconvenience,

Carter

On Jul 22, 2010, at 4:58 AM, Rafael Barbosa wrote:

> Hi,
> 
> In the mailing list I found a similar problem to what I was observing, but now I see that after "correcting" the flow direction, the test I reported does not make sense. Indeed one of the 'saddr' becomes a 'daddr', so no problem there.
> 
> After your explanation regarding the aggregation metrics (N, mean, etc) I understand the problem with aggregating/spliting flows. I will try to take it into account when getting some statistics from the files. Regarding your question:
> 
> In your example, looks like you want to count the number of unique flows per srcid every
> 5 seconds?
> 
> The test I did in reality was:
> racluster -r test -w test.cluster
> ragraph trans -M 5s -r test.cluster -w test.cluster.png
> ragraph trans -M 5s -r test -w test.png
> 
> Comparing the graphs, I see completely different results, so I tried to reproduce the results using rabins (it's easier to send its output to this list): report the number of flows per 5s bin. 
> 
> The proposed solution still does not reproduce the original results:
> 
> rabins -M dsrs="-agr" -m srcid -M hard time 5s -r test/test.cluster -s stime trans
>    14:37:15.000000     62
>    14:37:20.000000     57
>    14:37:25.000000     19
> 
> For now I will avoid rabins/racluster in files already aggregated.
> 
> Rafael
> 
> On Wed, Jul 21, 2010 at 6:04 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Rafael,
> When you think you have a bug, if you can send an argus datafile that demonstrates
> the problem, I can probably determine if it really is a bug and fix it in a short period
> of time.
> 
> OK, A few things.  All of the ra* aggregators have a mechanism to "correct" the
> direction of a particular flow record.  Because you are tracking a direction dependent
> attribute, "saddr", you may be seeing the results of racluster() "correcting" a records
> direction.  If you don't want this type of correction, you need to specify that in a
> racluster.conf file, but generally, correcting for direction is a very good thing.
> 
> See ./support/Config/racluster.conf and check out the racluster.1 manpage.
> You will want to set this variable.
> 
>    RACLUSTER_AUTO_CORRECTION=no
> 
> Now that doesn't mean there isn't a bug, just means that we have to account for that
> possibility.  Look at the actual records that racluster() generates, to see if you
> understand that output, and if there are still problems, then send email.
> 
> 
> OK, with respect to your racluster->rabins inconsistency.  What number do you think
> you are generating?  Number of unique flows per srcid per 5 seconds?  You will need
> to change the call to rabins() in order to get that number.
> 
> All ra* aggregators insert into the records an ARGUS_AGR_DSR information
> element into the records.  That is the structure that contains the 'N' (trans), 'mean',
> 'max', 'min', 'stddev' metrics for the aggregation.  When you run aggregation twice, the
> next aggregator simply adds to any existing "agr" dsr.  This is important for lots of
> reasons, but creates errors for some analytics.
> 
> Rabins() while it is an aggregator, it also chops argus records along time lines.  In your
> case, if a record spans a 5 second time boundary, rabins() will cut the argus record into
> two records, and it will distribute the metrics, as it can.  For packet counts, byte counts,
> it is easy, it distributes the values based on the duration of the record.  But there are no
> rules for how to distribute the values in the ARGUS_AGR_DSR.  What currently happens
> is we copy the AGR, unmodified, into both records.  Based on the type of statistic, this
> is the right thing to do in many cases.  However in your case, where you are counting,
> you will get over counting, due to the duplication of numbers for some records.
> 
> What I can do, is to modify the 'N' of the ARGUS_AGR_DSR statistic, to distribute the
> number of samples for the statistic.  This may fix the inconsistency, and still preserve
> the value of the statistic.  However, that will not generate the statistic you are actually
> interested in.
> 
> In your example, looks like you want to count the number of unique flows per srcid every
> 5 seconds?  You need to remove the "agr" dsr for the input data of your call to rabins().
> 
>    rabins -M dsrs="-agr"  -m srcid -M hard time 5s -r test.cluster -s stime trans
> 
> That should get you the metric you're after.
> 
> Carter
> 
> 
> 
> On Jul 21, 2010, at 11:04 AM, Rafael Barbosa wrote:
> 
>>  Hi,
>> 
>> I have been having some problem with inconsistent ouptut from ragraph ploting Trans. I get different graphs comparing the results from "original" from the ones reduced with racluster.
>> 
>> I dug a bit and a found this old bug that might be related(http://thread.gmane.org/gmane.network.argus/6686/focus=6741):
>> 
>> Second, it seems racluster isn't adding up the trans field correctly, here is an example
>> 
>> ra -r file.argus -s saddr trans
>>       27.8.77.166      1
>>       27.8.77.166      1
>>       18.9.27.219      1
>>       18.9.27.219      1
>>      18.86.96.147      1
>>      18.86.96.147      1
>>     19.32.203.136      1
>>     19.32.203.136      1
>> 
>> racluster -r file.argus -m saddr -s saddr trans
>>     19.32.203.136      4
>>      18.86.96.147      3
>>       18.9.27.219      4
>>       27.8.77.166      3
>> 
>> This is what I get when I run something similar in one of my files:
>> 
>> ra -r file.argus -s saddr trans | sort
>>         10.16.4.11      1
>>         10.16.4.12      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.21      1
>>         10.16.4.22      1
>>         10.16.4.53      1
>>         10.16.4.53      1
>>         10.16.4.54      1
>>         10.16.4.54      1
>>         10.16.4.55      1
>>         10.16.4.71      1
>>         10.16.4.71      1
>>        10.16.5.249      1
>> racluster -r file.argus -m saddr -s saddr trans | sort
>>         10.16.4.11      1
>>         10.16.4.12      1
>>         10.16.4.21     13
>>         10.16.4.22      1
>>         10.16.4.53      1
>>         10.16.4.54      2
>>         10.16.4.55      1
>>         10.16.4.71      2
>>        10.16.5.249      1
>> 
>> The count for 10.16.4.53 should be 2. I think there is a bug in racluster when calculating trans. Here is another weird result:
>> ra -r big.file -N 100 -w test
>> racluster -r test -w test.cluster
>> rabins -m srcid -M hard time 5s -r test -s stime trans
>>    14:37:15.000000     62
>>    14:37:20.000000     72
>>    14:37:25.000000     19
>> rabins -m srcid -M hard time 5s -r test.cluster -s stime trans
>>    14:37:15.000000     81
>>    14:37:20.000000     76
>>    14:37:25.000000     36
>> 
>> I get the same result if I use rasplit and later on racluster, instead of rabins.
>> 
>> Thanks,
>> Rafael
> 
> 
> 
> 
> <test>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20100722/39fd8d03/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20100722/39fd8d03/attachment.bin>