normalized appbyte ratio
Carter Bullard
carter at qosient.com
Sat May 4 10:11:45 EDT 2013
Hey John,
The src and dst appbytes are both 64-bit unsigned ints, but on the wire, they are transmitted in the smallest int needed to handle the value.
We can derive the src / dst appbyte ratio, using these values, or we can insert the ratio as a float into the record itself, and toss the actual appbyte numbers, all for the sake of data reduction...whatever.
We do
Carter
Carter Bullard, QoSient, LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax
On May 3, 2013, at 9:13 PM, John Gerth <gerth at graphics.stanford.edu> wrote:
> I'm a bit confused... -0.0 is required by IEEE 754 which is why I thought it appropriate as it avoids any conversion.
> Do floating point values in binary argus records use something other than IEEE 754?
>
> If -0.0 is a problem, I'd want to think some more about whether to encode 0/0 as +/-inf
>
>
> Also not sure what's meant by "rather than sending the appbyte data" as
> this is just an additional metric.
>
> John Gerth gerth at graphics.stanford.edu Gates 378 (650) 725-3273
>
> On 5/3/13 4:51 PM, Carter Bullard wrote:
>> Hey John,
>> Well (s - d) / (s + d) isn't exactly the metric I was going to implement, but it could work very well for this and a few other things.
>>
>> I am not thinking that -0.0, is a good metric value. If we were to pass this float as a value in the flow records, rather than sending the appbyte data, we would have to pass it using the xdr libraries, which encode using IEEE 754 floating point. Not sure if it can handle -0.0.
>>
>> Maybe just have -inf and +inf.
>> Carter
>>
>> Carter Bullard, QoSient, LLC
>> 150 E. 57th Street Suite 12D
>> New York, New York 10022
>> +1 212 588-9133 Phone
>> +1 212 588-9134 Fax
>>
>> On May 3, 2013, at 6:43 PM, John Gerth <gerth at graphics.stanford.edu> wrote:
>>
>>> Just occurred to me that the issues with 0 can be finessed by normalizing.
>>> That is, compute the appbyte ratio as (s-d)/(s+d) with 0/0 returning -0.0
>>>
>>> The result is a number ranging from -1 (all inbound) to +1 (all outbound)
>>> which captures any asymmetry directly in the sign. The use of -0.0 is arguably
>>> a hack, but it allows the edge case to be captured in a way that is detectable
>>> if desired via signbit(), but has no computational impact.
>>>
>>> This formulation also permits the existence of both sabr and dabr in that they
>>> would just define the order of the subtraction although I'm not sure there's
>>> much utility in that.
>>>
>>>
>>> John Gerth gerth at graphics.stanford.edu Gates 378 (650) 725-3273 fax 725-6949
>>>
>>> On 5/3/2013 11:39 AM, Carter Bullard wrote:
>>>> Hey John,
>>>> So when dealing with the ratio ( [s | d]appbytes / [s | d]bytes) we do end
>>>> up with some issues we have to deal with. May not seem intuitive, but we
>>>> will have conditions where we end up with ( 0 / X ) and ( 0 / 0 ) as the actual
>>>> values for the metric, and ( 0 / X ) is a completely different state than ( 0 / 0 ).
>>>> While every flow record has to have at least some bytes in it, we can
>>>> easily have ( bytes == 0 ) in one of the directions. So it is a condition
>>>> we need to convey. We can return -1 for ( 0 / 0 ) to discriminate that
>>>> condition?
>>>>
>>>> In dealing with all the zero's that we may get in this new metric, a few
>>>> situations shouldn't exist. At least we know that when the denominator
>>>> of ( appbytes / bytes ) is zero, the numerator had better also be zero,
>>>> or something is definitely wrong ;O)
>>>>
>>>> Carter
>>>>
>>>> On May 2, 2013, at 1:14 AM, John Gerth <gerth at graphics.stanford.edu> wrote:
>>>>
>>>>> I'm a big fan of the appbyte metric and have created and used their ratio in the past.
>>>>>
>>>>> One interesting question that comes up is what to do with the 0's. It's important because
>>>>> knowing that one or both sides didn't send any payload can be significant (not to
>>>>> mention what to do when 0 is in the denominator).
>>>>>
>>>>> /J
>>>>>
>>>>> --
>>>>> John Gerth gerth at graphics.stanford.edu Gates 378 (650) 725-3273
>>>>>
>>>>> On 5/1/13 5:23 AM, Carter Bullard wrote:
>>>>>> Hey Jesse,
>>>>>> How about we make a new field; " [ s | d ]abr " for the [ src or dst ] appbyte ratio ? I'll do that today.
>>>>>>
>>>>>> Not sure what is happening with the multiple addresses showing up. That would seem to be a bug. Can you share some data so I can try to recreate the
>>>>>> problem ?
>>>>>>
>>>>>> Carter
>>>>>>
>>>>>> On Apr 30, 2013, at 10:44 PM, Jesse Bowling <jessebowling at gmail.com <mailto:jessebowling at gmail.com>> wrote:
>>>>>>
>>>>>>> Hi Carter,
>>>>>>>
>>>>>>> I've been working through this example; this is a very interesting approach in that you're boiling host network patterns into a single number that
>>>>>>> you can watch over time to indicate a change in the host...This sort of distillation seems like a big win, once you're instrumented to track it! ...
>>>>>>>
>>>>>>> On that subject, I had some difficulties while trying to blindly implement the commands you gave and wanted to send back some notes and questions to
>>>>>>> the list...
>>>>>>>
>>>>>>> * The text states you need "-M rmon" in the first racluster, but the example doesn't include it; I found it should be:
>>>>>>>
>>>>>>> racluster -R argus_dir/ -M rmon -m saddr proto sport -w argus.out - 'ipv4'
>>>>>>>
>>>>>>> * I found I could calculate the ratio of sappbytes/dappbytes (and create a 'label') using awk like:
>>>>>>>
>>>>>>> awk '{if( $8 + 0 != 0) {LABEL="Balanced";RATIO=$7/$8; if ( RATIO > 1.5) {LABEL="Producer"}; if (RATIO < 0.95) {LABEL="Consumer"}; print
>>>>>>> $0,RATIO"\t"LABEL}}' ra_text_output_file
>>>>>>>
>>>>>>> However my example is based on the fields in my rarc file, and thus this method isn't very elegant...and will also miss any records that are missing
>>>>>>> a field...It would seem that this metric would be easy to calculate with the clients themselves and would give the added benefit of allowing for
>>>>>>> ralabel'ing to be used on the metric (much more portable and useful I think)...I think this is a feature request... :)
>>>>>>>
>>>>>>> * I wanted to start iterating through various test cases on my data, varying time ranges and networks that I examined. I found that I can get very
>>>>>>> 'off' results based on how I try to filter which networks I want...for instance:
>>>>>>>
>>>>>>> This example will lead to hosts showing up multiple times in the final output
>>>>>>> # /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w ${TMP1} - 'ipv4 and *src net 10.10.10.0/24 <http://10.10.10.0/24>*'
>>>>>>> #/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort -r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes
>>>>>>>
>>>>>>> This example will appears to be fine in the final output
>>>>>>> # /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w ${TMP1} - 'ipv4 and *net 10.10.10.0/24 <http://10.10.10.0/24>*'
>>>>>>> #/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort -r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes
>>>>>>>
>>>>>>> I think I have a misunderstanding about how racluster and filters interact; can you explain why the 'src' part in the first example would cause
>>>>>>> multiple entries for individual hosts in the final output?
>>>>>>>
>>>>>>> Thank you for sharing your knowledge and experience to this community!
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jesse
>
More information about the argus
mailing list