appbyte ratio

John Gerth gerth at graphics.stanford.edu
Fri May 3 15:27:13 EDT 2013


Yes, indeed, these are the kinds of issues I ran across as well.
But isn't it also true that when you consider appbytes as opposed
to bytes, that it's fairly easy to have 0/0 flows, e.g. abandonded
TCP hand-shakes?

Anyway, to partially address this, I created a new metric which I
use in searches all the time.  It summarizes the nature of the appbyte
exchange as follows:
      3 - appbytes non-zero for both src and dst
      2 - packets exchanged but zero appbytes in one or both dirs
      1 - packets in only one dir
      0 - malformed flow (e.g. 0 src pkts which can happen with backscatter)

The values were chosen to be useful in relational searches and there's
another wrinkle to make those more effective in which I make the metric signed,
but that's another story.

--
John Gerth      gerth at graphics.stanford.edu  Gates 378   (650) 725-3273 fax 725-6949

On 5/3/2013 11:39 AM, Carter Bullard wrote:
> Hey John,
> So when dealing with the ratio ( [s | d]appbytes / [s | d]bytes) we do end
> up with some issues we have to deal with.  May not seem intuitive, but we
> will have conditions where we end up with ( 0 / X ) and ( 0 / 0 ) as the actual
> values for the metric, and ( 0 / X ) is a completely different state than ( 0 / 0 ).
> While every flow record has to have at least some bytes in it, we can
> easily have ( bytes == 0 ) in one of the directions.   So it is a condition
> we need to convey.  We can return -1 for ( 0 / 0 ) to discriminate that
> condition?
> 
> In dealing with all the zero's that we may get in this new metric, a few
> situations shouldn't exist.  At least we know that when the denominator
> of ( appbytes / bytes ) is zero, the numerator had better also be zero,
> or something is definitely wrong ;O)
> 
> Carter
> 
> On May 2, 2013, at 1:14 AM, John Gerth <gerth at graphics.stanford.edu> wrote:
> 
>> I'm a big fan of the appbyte metric and have created and used their ratio in the past.
>>
>> One interesting question that comes up is what to do with the 0's. It's important because
>> knowing that one or both sides didn't send any payload can be significant (not to
>> mention what to do when 0 is in the denominator).
>>
>> /J
>>
>> --
>> John Gerth      gerth at graphics.stanford.edu  Gates 378   (650) 725-3273
>>
>> On 5/1/13 5:23 AM, Carter Bullard wrote:
>>> Hey Jesse,
>>> How about we make a new field;  " [ s | d ]abr " for the [ src or dst ] appbyte ratio ?  I'll do that today.
>>>
>>> Not sure what is happening with the multiple addresses showing up. That would seem to be a bug.  Can you share some data so I can try to recreate the
>>> problem ?
>>>
>>> Carter
>>>
>>> On Apr 30, 2013, at 10:44 PM, Jesse Bowling <jessebowling at gmail.com <mailto:jessebowling at gmail.com>> wrote:
>>>
>>>> Hi Carter,
>>>>
>>>> I've been working through this example; this is a very interesting approach in that you're boiling host network patterns into a single number that
>>>> you can watch over time to indicate a change in the host...This sort of distillation seems like a big win, once you're instrumented to track it! ...
>>>>
>>>> On that subject, I had some difficulties while trying to blindly implement the commands you gave and wanted to send back some notes and questions to
>>>> the list...
>>>>
>>>> * The text states you need "-M rmon" in the first racluster, but the example doesn't include it; I found it should be:
>>>>
>>>> racluster -R argus_dir/ -M rmon -m saddr proto sport -w argus.out - 'ipv4'
>>>>
>>>> * I found I could calculate the ratio of sappbytes/dappbytes (and create a 'label') using awk like:
>>>>
>>>> awk '{if( $8 + 0 != 0) {LABEL="Balanced";RATIO=$7/$8; if ( RATIO > 1.5) {LABEL="Producer"}; if (RATIO < 0.95) {LABEL="Consumer"}; print
>>>> $0,RATIO"\t"LABEL}}' ra_text_output_file
>>>>
>>>> However my example is based on the fields in my rarc file, and thus this method isn't very elegant...and will also miss any records that are missing
>>>> a field...It would seem that this metric would be easy to calculate with the clients themselves and would give the added benefit of allowing for
>>>> ralabel'ing to be used on the metric (much more portable and useful I think)...I think this is a feature request... :)
>>>>
>>>> * I wanted to start iterating through various test cases on my data, varying time ranges and networks that I examined. I found that I can get very
>>>> 'off' results based on how I try to filter which networks I want...for instance:
>>>>
>>>> This example will lead to hosts showing up multiple times in the final output
>>>> # /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w ${TMP1} - 'ipv4 and *src net 10.10.10.0/24 <http://10.10.10.0/24>*'
>>>> #/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort -r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes
>>>>
>>>> This example will appears to be fine in the final output
>>>> # /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w ${TMP1} - 'ipv4 and *net 10.10.10.0/24 <http://10.10.10.0/24>*'
>>>> #/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort -r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes
>>>>
>>>> I think I have a misunderstanding about how racluster and filters interact; can you explain why the 'src' part in the first example would cause
>>>> multiple entries for individual hosts in the final output?
>>>>
>>>> Thank you for sharing your knowledge and experience to this community!
>>>>
>>>> Cheers,
>>>>
>>>> Jesse
>>>>
> 




More information about the argus mailing list