normalized appbyte ratio for producer/consumer relationship

Wed May 29 10:12:15 EDT 2013

Hey John,
Just some dialog to talk about how to use ABR ( or should it be PCR, for producer /
consumer ratio ? ).  I'm going to go through a random recent day, to see if there are
trends, or gottcha's using the number.  So far, we're saying that if a network
behavior is near the 1.0 or -1.0 value, generally, then transformation to the
opposite side of the scale should be easily detectable and immediately relevant
to the cyber security problem set.  So if a host is a consumer, for it to all of sudden
become a producer, should be a problem.  So lets see what's up.

Here is a daily report for one of my big networks looking at only the ABR metric per address.
As you know, I have a rasqlinsert() process that is maintaining a ( mac / ip address ) inventory
table that is updated in near realtime (< 5 sec off realtime), that captures the abr variable.
Here is a printout for the Monday.  Off day, US holiday.  Here are all the local IP addresses
from the perspective of the interior interfaces of my border routers:

thoth:~ carter$ rasql -M time 1d -t -2d+1d -r mysql://root@localhost/ratop/etherHost_%Y_%m_%d \
                   -w - - net 192.168.0.0/16 and srcid 192.168.0.1 and not ether src ff:ff:ff:ff:ff:ff | \
                   rasort -m abr -s ltime dur smac saddr spkts dpkts abr -p3

               LastTime        Dur             SrcMac            SrcAddr  SrcPkts  DstPkts    ABRatio 
2013/05/27.23:56:55.839  85848.281  00:21:5a:39:d7:a2      192.168.0.127     1458        0      1.000 P
2013/05/27.17:57:56.563  42787.016  74:44:01:8f:82:fc       192.168.0.47        2        0      1.000 P
2013/05/27.04:20:50.044      0.000  00:12:3f:bc:58:a4       192.168.0.70        1        0      1.000 P
2013/05/27.18:23:04.308  10689.529  00:15:17:78:08:8f       192.168.0.43      768        2      0.994 P
2013/05/27.23:50:49.394  85627.617  90:84:0d:ef:70:0b       192.168.0.34      179       24      0.954 P
2013/05/27.23:50:49.418  85627.672  90:84:0d:d2:2a:8e        192.168.0.2      160       25      0.939 P
2013/05/27.23:57:22.091  86073.227  00:0b:db:5c:e5:7c      192.168.0.164      624      254      0.426 B
2013/05/27.23:59:57.846  86389.781  00:11:d9:31:c1:11       192.168.0.33    15457    11419      0.357 B
2013/05/27.23:59:21.240  86337.547  80:71:1f:3c:c3:88        192.168.0.1     3563     3602     -0.013 B
2013/05/27.23:59:58.436  86391.156  c8:2a:14:58:7a:55       192.168.0.66    19216    15089     -0.295 B
2013/05/27.23:59:48.667  86356.648  00:23:32:2f:ac:9c       192.168.0.68   467274   640295     -0.986 C
2013/05/27.23:58:39.239  86287.312  1c:ab:a7:8b:ae:ce       192.168.0.41    49007    87315     -0.999 C

I added the P, B and C by hand for this, but ralabel() can easily provide this label.

So, this is basically the internal IPs that my border router sees, that are mapped to real ethernet addresses.
Every flow is aggregated into a single entry per address, so you're getting everything
that this observation domain saw for these IP addresses.

So what have we got?  We've got a few pure producers, a few mixed, and a few pure
consumers.  Lets talk about them….

192.168.0.127 is one of my printers that is adversing away, saying I'm here….  We don't see
any actual traffic going to and from the printer to support the actual printing function, so its
very nice that we see the printer as a pure 1.000 producer.  If anyone from the outside
used this printer, this number would go down, or if this printer started sending data to
its manufacturer, or started to transmit the pages it printed to a 3rd party, regardless
of protocol, this numberwould deviate from 1.0 and we'd have a real anomaly.  That's easy.

The traffic seen for 192.168.0.47 and 192.168.0.70 are just ARP broadcast requests, without
any responses, as we're on the wrong host to see any answers.  These packets are important,
as we can realize that these hosts are there, and get some sense of their activity / health.
But for the purposes of this study, they influence the numbers a bit.

So, lets run this again with a filter to toss the arp packets, to see what we get.  Lets add the filter
" and ip ",  tossing the broadcast ethernet part of the filter.

thoth:~ carter$ rasql -M time 1d -t -2d+1d -r mysql://root@localhost/ratop/etherHost_%Y_%m_%d \
                   -w - - net 192.168.0.0/16 and srcid 192.168.0.1 and ip | \
                   rasort -m abr -s ltime dur smac saddr spkts dpkts abr -p3

               LastTime        Dur             SrcMac            SrcAddr  SrcPkts  DstPkts    ABRatio 
2013/05/27.23:56:55.839  85848.281  00:21:5a:39:d7:a2      192.168.0.127     1458        0      1.000 P
2013/05/27.18:23:04.308  10689.529  00:15:17:78:08:8f       192.168.0.43      768        2      0.994 P
2013/05/27.23:50:49.394  85627.617  90:84:0d:ef:70:0b       192.168.0.34      179       24      0.954 P
2013/05/27.23:50:49.418  85627.672  90:84:0d:d2:2a:8e        192.168.0.2      160       25      0.939 P
2013/05/27.23:57:22.091  86073.227  00:0b:db:5c:e5:7c      192.168.0.164      624      254      0.426 B
2013/05/27.23:59:57.846  86389.781  00:11:d9:31:c1:11       192.168.0.33    15457    11419      0.357 B
2013/05/27.23:59:58.436  86391.156  c8:2a:14:58:7a:55       192.168.0.66    19216    15089     -0.295 B
2013/05/27.23:59:48.667  86356.648  00:23:32:2f:ac:9c       192.168.0.68   467274   640295     -0.986 C
2013/05/27.23:59:57.846  85974.359  ff:ff:ff:ff:ff:ff      192.168.0.255        0     5214     -1.000 C

Looking good !!! So, now we've got more bidirectional flows, so that's a good thing.
ARPs are important, but the producer / consumer relationship for control plane traffic
will tend to be balanced.  ARP, DNS, DHCP, OSPF, BGP, SIP come up pretty balanced
most of the time…..

OK going through the list again, looking at the the other extreme.  192.168.0.255 is the
local subnet broadcast address, and so it really needs to be a pure consumer of data,
as its illegal for that address to transmit data on an IETF standard internet.  Now
its easy for a system to use that address on your network, so watching for a deviation
from 1.0 for this address is a good thing to look out for.  It would have a different mac
address, but the abr is still a good indicator.

Now, 192.168.0.34 is a bit of an odd ball, as its a control system, so its really only
trying to sync its clock, using NTP, but its also advertising availability through mdns…..
which really makes it look like a big producer…  So its a producer into the multicast
address space.  We should get rid of multicast for this, I suspect…., which would
convert this address into a true balanced producer / consumer, so that we would
know that its support infrastructure.

Addresses 192.168.0.164, and 192.168.0.66 are also infrastructure systems, so
being closer to 0 is the right place for these guys.  While the notion that balanced
assets are infrastructure or control is a bit of a stretch, if there is a trend, that would
be one I would analyze.

OK, that leaves us with 192.168.0.68, which is the most active desktop.
While the packet ratio looks to be balanced, -0.156, the application bytes are
way in the direction of consumer, and that is where we want it to be, which to me
validates the metric.  We want this host to consumer the Internet, not be
consumed by it.  

So while QoSient, world headquarters may seem small, at least to the outside world,
on Monday, it was doing what it was suppose to do !!!  Later I'll send either todays
or tomorrows list to see how they compare.

Carter 

On May 7, 2013, at 11:31 PM, Carter Bullard <carter at qosient.com> wrote:

> Hey John,
> Here is a run to try out the abr metric for basic producer/consumer anomaly detection.
> Picked a bunch of hosts that are involved in a service that runs on a specific port.
> There is a data transfer function, and there is a command and control function.
> The abr metric picks out the producer and consumers for the data transfer, and it points to those that are involved in command and control, here you go.
> 
> thoth:backups root# racluster -M rmon -m saddr -r monthly.data -w - | \
>                     rasort -m abr -s stime dur:16 proto saddr spkts:12 dpkts:12 abr
>                  StartTime              Dur  Proto            SrcAddr      SrcPkts      DstPkts    ABRatio 
> 2013/02/05.14:03:48.304265   2022912.500000    tcp       192.168.1.31    118813516     85735688   0.999937
> 2013/02/06.16:19:37.099642   1895160.500000    tcp       192.168.2.75         3621         1899   0.997145
> 2013/02/07.12:06:09.973606     27554.181641    tcp       192.168.2.34          732          650   0.915472
> 2013/02/27.11:40:47.093087         4.957941    tcp       192.168.4.35           13           12   0.551477
> 2013/02/18.14:17:01.287529    758586.750000    tcp      192.168.7.166           23           18   0.422487
> 2013/02/06.10:10:26.473381        30.864573    tcp     192.168.12.149            7            5   0.295455
> 2013/02/13.16:45:55.793151        11.732343    tcp      192.168.7.155           15           13   0.250169
> 2013/02/06.09:31:55.244962   1228359.500000    tcp       192.168.2.54           32           35   0.241102
> 2013/02/08.10:09:14.794703   1727145.250000    tcp      192.168.0.125           21           18   0.213155
> 2013/02/06.13:17:45.550931   1819270.875000    tcp      192.168.1.138         1227         1231   0.135222
> 2013/02/15.11:28:36.104691        50.191151    tcp       192.168.2.71           75           59   0.100396
> 2013/02/07.14:00:35.770555        83.157898    tcp       192.168.0.70          894          890   0.057422
> 2013/02/12.14:19:09.720183         1.038289    tcp      192.168.1.125            7            8   0.029588
> 2013/02/21.03:07:04.628043        32.868526    tcp       192.168.0.72            7            5   0.023810
> 2013/02/11.08:39:44.024865   1478973.500000    tcp       192.168.2.45          268          157  -0.015058
> 2013/02/19.14:25:03.376258        28.092548    tcp      192.168.7.153            7            5  -0.043860
> 2013/02/06.14:05:03.101059        65.313805    tcp       192.168.2.58            6            7  -0.055469
> 2013/02/06.13:17:45.550931   1772498.375000    tcp       192.168.2.57           13           16  -0.164201
> 2013/02/06.12:20:05.543201   1913705.250000    tcp      192.168.2.138          256          317  -0.173220
> 2013/02/20.12:43:35.083127    424459.562500    tcp      192.168.2.102         1147         1160  -0.343431
> 2013/02/20.13:46:25.772704       940.071899    tcp       192.168.12.2           16           17  -0.392946
> 2013/02/07.12:55:49.369160   1822594.375000    tcp      192.168.1.110          296          400  -0.456629
> 2013/02/06.12:51:40.056876   1901223.250000    tcp      192.168.1.123          165          222  -0.601214
> 2013/02/06.09:37:40.607721   1900794.750000    tcp      192.168.1.111          681          995  -0.612489
> 2013/02/05.14:44:23.517669   1818851.000000    tcp      192.168.1.117           79          102  -0.627868
> 2013/02/05.14:05:30.070586   1889467.750000    tcp      192.168.1.127          143          175  -0.669750
> 2013/02/06.09:31:55.244962    697157.125000    tcp      192.168.1.115           51           56  -0.683094
> 2013/02/13.16:07:23.915251   1291093.750000    tcp      192.168.1.130           61           75  -0.694824
> 2013/02/07.13:33:41.019318   1557935.000000    tcp      192.168.1.113          100          133  -0.706807
> 2013/02/06.12:06:43.001669   1721456.875000    tcp      192.168.1.112          618         1010  -0.740505
> 2013/02/23.17:46:07.768913         0.588042    tcp       192.168.9.84           12           15  -0.758408
> 2013/02/06.11:31:22.894660   1906436.625000    tcp      192.168.1.122           76          101  -0.762083
> 2013/02/05.15:07:18.072928   1977283.125000    tcp      192.168.1.114           90          120  -0.781786
> 2013/02/08.09:07:55.007936   1140006.125000    tcp      192.168.1.118          112          136  -0.787711
> 2013/02/05.14:04:02.548134   2022898.250000    tcp       192.168.2.47       327011       243367  -0.795821
> 2013/02/06.14:30:10.891759   1905899.875000    tcp      192.168.1.116          343          588  -0.934336
> 2013/02/07.12:06:09.973606   1807785.625000    tcp      192.168.1.121         2579         4398  -0.990442
> 2013/02/05.14:03:48.304265   1465134.125000    tcp       192.168.2.29     85407678    118569168  -0.999999                                                                                                                                    
> 
> 
> So we take a some records, in this case a complete month's worth of traffic involved in a specific application,
> involving a specific subnet.  We want to know what hosts are producers and consumers for this app.
> We need to get the bi-directional flow data into a single object statistic, so we'll aggregate the data for RMON
> data processing (one object, in and out stats), and merge for the " saddr ", then just rasort() on the abr field.
> 
> We get a list from Producers to Consumers, and the guys in the middle where the abr approaches 0, and we have
> balanced communications, we see the complete spectrum of data push agents (producers) where ( ABRation > 0.75 )
> on top, and we have the pure data sinks, where the ( ABRatio < -0.75 ), and we've got maybe command
> and control in the ( -0.5 < ABRatio < 0.5 ) range ?  Probably need to add a threshold for the amount of
> data sent and received, to weed out the announcers in the command and control network...
> 
> I'd go for that set of rules for this specific application, in this observation domain…..
> 
> Carter
> 
> 
> 
> On May 6, 2013, at 6:34 PM, John Gerth <gerth at graphics.stanford.edu> wrote:
> 
>> Yes, well an abr of 0.0, even without using -0.0 can have multiple meanings
>> since you will get 0 as long as s=d even if they are large.  The use of -0.0
>> is perhaps a bit cute, and, as you point out, since IEEE 754 requires 0.0 == -0.0
>> in all relational tests, one has to use signbit() to disambiguate in C.
>> 
>> Even so, I think your example shows that abr has good potential as a metric.
>> 
>> John Gerth      gerth at graphics.stanford.edu  Gates 378
>> 
>> On 5/6/2013 2:09 PM, Carter Bullard wrote:
>>> Hey John,
>>> OK, so there is a problem, in that IEEE floating point has ( -0.0 == 0.0 )
>>> by definition.  So, I fixed it in my compiler, but others are going to have
>>> issues when needing to discriminate between 0.0 and -0.0.
>>> 
>>> So, with the new argus-clients, you can filter for any value for abr, using
>>> any of the ra* programs.  Still have to work on graphing abr, but I'll get
>>> to that later tonight.
>>> 
>>> Here is the abr behavior for every DNS request here at QoSient World HQ
>>> for 2013, that weren't ServFail errors:
>>> 
>>> thoth:tmp carter$ rahisto -H abr 10:-1.0-1.0 -r argus*domain* -s mean stddev - src pkts 1 and dst pkts 1 and not abr 0.0
>>> N = 1009841  mean = -0.739084  stddev =  0.102829  max = -0.162791  min = -0.909605
>>>           median = -0.749129     95% = -0.653846
>>>             mode = -0.782609
>>> Class      Interval         Freq    Rel.Freq     Cum.Freq          Mean     StdDev 
>>>     1   -1.000000e+00     225379    22.3183%     22.3183%     -0.815238   0.000650
>>>     2   -8.000000e-01     738148    73.0955%     95.4137%     -0.740887   0.043837
>>>     3   -6.000000e-01      10553     1.0450%     96.4588%     -0.534672   0.048717
>>>     4   -4.000000e-01      35511     3.5165%     99.9752%     -0.283067   0.021374
>>>     5   -2.000000e-01        250     0.0248%    100.0000%     -0.162791   0.000051
>>>     6    0.000000e+00          0     0.0000%    100.0000%    
>>>     7    2.000000e-01          0     0.0000%    100.0000%    
>>>     8    4.000000e-01          0     0.0000%    100.0000%    
>>>     9    6.000000e-01          0     0.0000%    100.0000%    
>>>    10    8.000000e-01          0     0.0000%    100.0000% 
>>> 
>>> 
>>> So, anything positive would be a behavioral anomaly, from the perspective of this host.
>>> 
>>>   ra -r file.from.the.host - udp and port domain and src pkts 1 and dst pkts 1 and abr gt 0.0
>>> 
>>> and this would pick out candidates for DNS server availability errors:
>>> 
>>>   ra -r file.from.the.host - udp and port domain and src pkts 1 and dst pkts 1 and abr eq 0.0
>>> 
>>> Of course, you can do an analysis of every service, and get a rather interesting set of
>>> " what is normal " behaviors, using this simple type of metric.  For those services where
>>> the abr is always positive or always negative, seeing a shift to the other side, can
>>> indicate events that should be of interest.
>>> 
>>> Hope this is helpful,
>>> 
>>> Carter
>>> 
>>> 
>>> On May 6, 2013, at 2:41 PM, Carter Bullard <carter at qosient.com <mailto:carter at qosient.com>> wrote:
>>> 
>>>> Hey John,
>>>> If no appbytes, currently we return -0.0, but the library knows if there are
>>>> appbytes or not, so we can return nada, when printing out the values.
>>>> Right now, when using xml format, you won't get a value.
>>>> 
>>>> Having problems getting my compiler to tell the difference between 0.0 and -0.0,
>>>> but should hopefully have this working by this afternoon.
>>>> 
>>>> Carter
>>>> 
>>>> 
>>>> On May 6, 2013, at 2:37 PM, John Gerth <gerth at graphics.stanford.edu <mailto:gerth at graphics.stanford.edu>> wrote:
>>>> 
>>>>> Nice example. I'm looking forward to using this.
>>>>> 
>>>>> As your example shows, this metric is available for any existing argus
>>>>> files that were created containing appbyte values. I'm assuming that if
>>>>> the sensor wasn't configured to capture those, 'abr' is not available.
>>>>> 
>>>>> 
>>>>> --
>>>>> John Gerth      gerth at graphics.stanford.edu <mailto:gerth at graphics.stanford.edu>  Gates 378   (650) 725-3273 fax 725-6949
>>>>> 
>>>>> On 5/6/2013 11:19 AM, Carter Bullard wrote:
>>>>>> Hey John,
>>>>>> OK, so I've implemented " abr " as a new metric, using our normalized equation:
>>>>>> 
>>>>>> abr = (sappbytes - dappbytes)/(sappbytes + dappbytes)
>>>>>> 
>>>>>> This generates values between +1.0 - -1.0.  +1.0 means that all the app bytes
>>>>>> were from the source, indicating that the source is a pure PRODUCER, and the
>>>>>> destination is a pure CONSUMER.  You see this in FTP PUT file transfers,
>>>>>> as an example.  The sign bit reverses this relationship.
>>>>>> 
>>>>>> -0.0 denotes the special case, when there are no appbytes seen.
>>>>>> 
>>>>>> In the new argus-clients that I'll put up later today, you can print this out using:
>>>>>> 
>>>>>> ra -r argus.data -s +abr
>>>>>> 
>>>>>> You can also do operations using this metric, such as filter and generate histograms.
>>>>>> Here is a run that I did to show how this maybe used in an anomaly detection
>>>>>> application.  Here is the simple frequency distribution for all the internal DNS
>>>>>> requests made to my local DNS server from a specific client, for all of 2013:
>>>>>> 
>>>>>> thoth:06 carter$ pwd
>>>>>> /Volumes/Data/Archive/QoSient/192.168.0.68/2013
>>>>>> thoth:tmp carter$ rahisto -H abr 10:-1.0-1.0 -R . -s mean stddev - udp port domain and src pkts 1 and dst pkts 1
>>>>>> N = 1027764  mean = -0.726195  stddev =  0.140532  max = 0.000000  min = -0.909605
>>>>>>         median = -0.749129     95% = -0.292517
>>>>>>           mode = -0.782609
>>>>>> Class      Interval         Freq    Rel.Freq     Cum.Freq          Mean     StdDev
>>>>>>   1   -1.000000e+00     225379    21.9291%     21.9291%     -0.815238   0.000650
>>>>>>   2   -8.000000e-01     738148    71.8208%     93.7498%     -0.740887   0.043837
>>>>>>   3   -6.000000e-01      10553     1.0268%     94.7766%     -0.534672   0.048717
>>>>>>   4   -4.000000e-01      35511     3.4552%     98.2318%     -0.283067   0.021374
>>>>>>   5   -2.000000e-01        250     0.0243%     98.2561%     -0.162791   0.000051
>>>>>>   6    0.000000e+00      17923     1.7439%    100.0000%      0.000000   0.000000
>>>>>>   7    2.000000e-01          0     0.0000%    100.0000%    
>>>>>>   8    4.000000e-01          0     0.0000%    100.0000%    
>>>>>>   9    6.000000e-01          0     0.0000%    100.0000%    
>>>>>>  10    8.000000e-01          0     0.0000%    100.0000%    
>>>>>> 
>>>>>> 
>>>>>> OK, should be very clear, that my host is a net CONSUMER of DNS data, not a net PRODUCER
>>>>>> because the " abr <= 0 ".  The corollary holds true, the local DNS service is a net PRODUCER of
>>>>>> data, and not a net CONSUMER of data, from the prospective of this particular end system.
>>>>>> So testing filters like this:
>>>>>> ra -r daily.file - abr gt 0 and port domain and src pkts 1 and dst pkts 1
>>>>>> 
>>>>>> Should reveal flows that deserve a closer look.
>>>>>> 
>>>>>> OK, there were a lot of flows where the ( abr == 0 ), which was surprising.
>>>>>> When DNS experiences a ServFail, the response is the same as the request, just with an error bit
>>>>>> set in the DNS header.  QoSient had a big issue in Jan, 2013, when 17923 DNS ServFail failures
>>>>>> occurred, so that is where the ( abr == 0 ) flows occured.  Important to know this when evaluating
>>>>>> DNS as a channel for CONSUMER to PRODUCER conversion.
>>>>>> 
>>>>>> But for DNS health and operability, looking for flows where the ( sappbytes == dappbytes ) is
>>>>>> also a pretty interesting thing to look for.
>>>>>> 
>>>>>> Hope this is helpful,
>>>>>> 
>>>>>> Carter
>>>>>> 
>>>>>> Carter Bullard
>>>>>> CEO/President
>>>>>> QoSient, LLC
>>>>>> 150 E. 57th Street Suite 12D
>>>>>> New York, New York 10022
>>>>>> 
>>>>>> +1 212 588-9133 Phone
>>>>>> +1 212 588-9134 Fax
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130529/61703482/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130529/61703482/attachment.bin>