appbyte ratio

John Gerth gerth at graphics.stanford.edu
Thu May 2 01:14:38 EDT 2013


I'm a big fan of the appbyte metric and have created and used their ratio in the past.

One interesting question that comes up is what to do with the 0's. It's important because
knowing that one or both sides didn't send any payload can be significant (not to
mention what to do when 0 is in the denominator).

 /J

--
John Gerth      gerth at graphics.stanford.edu  Gates 378   (650) 725-3273

On 5/1/13 5:23 AM, Carter Bullard wrote:
> Hey Jesse,
> How about we make a new field;  " [ s | d ]abr " for the [ src or dst ] appbyte ratio ?  I'll do that today.
> 
> Not sure what is happening with the multiple addresses showing up. That would seem to be a bug.  Can you share some data so I can try to recreate the
> problem ?
> 
> Carter
> 
> On Apr 30, 2013, at 10:44 PM, Jesse Bowling <jessebowling at gmail.com <mailto:jessebowling at gmail.com>> wrote:
> 
>> Hi Carter,
>>
>> I've been working through this example; this is a very interesting approach in that you're boiling host network patterns into a single number that
>> you can watch over time to indicate a change in the host...This sort of distillation seems like a big win, once you're instrumented to track it! ...
>>
>> On that subject, I had some difficulties while trying to blindly implement the commands you gave and wanted to send back some notes and questions to
>> the list...
>>
>> * The text states you need "-M rmon" in the first racluster, but the example doesn't include it; I found it should be:
>>
>> racluster -R argus_dir/ -M rmon -m saddr proto sport -w argus.out - 'ipv4'
>>
>> * I found I could calculate the ratio of sappbytes/dappbytes (and create a 'label') using awk like:
>>
>> awk '{if( $8 + 0 != 0) {LABEL="Balanced";RATIO=$7/$8; if ( RATIO > 1.5) {LABEL="Producer"}; if (RATIO < 0.95) {LABEL="Consumer"}; print
>> $0,RATIO"\t"LABEL}}' ra_text_output_file
>>
>> However my example is based on the fields in my rarc file, and thus this method isn't very elegant...and will also miss any records that are missing
>> a field...It would seem that this metric would be easy to calculate with the clients themselves and would give the added benefit of allowing for
>> ralabel'ing to be used on the metric (much more portable and useful I think)...I think this is a feature request... :)
>>
>> * I wanted to start iterating through various test cases on my data, varying time ranges and networks that I examined. I found that I can get very
>> 'off' results based on how I try to filter which networks I want...for instance:
>>
>> This example will lead to hosts showing up multiple times in the final output
>> # /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w ${TMP1} - 'ipv4 and *src net 10.10.10.0/24 <http://10.10.10.0/24>*'
>> #/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort -r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes
>>
>> This example will appears to be fine in the final output
>> # /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w ${TMP1} - 'ipv4 and *net 10.10.10.0/24 <http://10.10.10.0/24>*'
>> #/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort -r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes
>>
>> I think I have a misunderstanding about how racluster and filters interact; can you explain why the 'src' part in the first example would cause
>> multiple entries for individual hosts in the final output?
>>
>> Thank you for sharing your knowledge and experience to this community!
>>
>> Cheers,
>>
>> Jesse
>>
>>
>>
>> On Tue, Apr 2, 2013 at 5:09 PM, Carter Bullard <carter at qosient.com <mailto:carter at qosient.com>> wrote:
>>
>>     Gentle people,
>>     To continue on the Argus and APT1 discussion, I had written that the Mandiant
>>     APT1 document described two basic nodes in the APT1 attack infrastructure,
>>     the Hop Point and the Target Attack Nodes.  I'm going to continue to write about
>>     Hop Points for a few more emails, because, having one of your machines acting
>>     as an APT1 Hop Point, is possibly the worst thing that can happen to you in the
>>     APT1 attack life cycle.
>>
>>     I suggested that the best strategy for identifying APT1 Hop Points is to use Time
>>     Series Analysis methods, specifically Transfer Function Models, and Intervention
>>     Analysis, to realize that a node has been transformed, (Identification) and to
>>     realize who, what, when, how it was transformed (Attribution).  Now, I'm pretty
>>     sure that most people are not interested in a long discourse on how to use
>>     2nd and 3rd order differentials over different time periods, to recognize
>>     trending discontinuities.  This stuff is pretty complicated, and advanced even
>>     for complex Time Series forecasting and control methods.  But that is the
>>     kind of direction you want to go in if you want to do Machine Learning methods,
>>     or if you want to do unsupervised systems for network behavioral anomaly
>>     detection, which would be a really cool thing to have.
>>
>>     In support of this APT1 Hop Point identification process, however, there are more
>>     direct things you can look at, that don't take a lot of math, and can be done with
>>     simple, effective, reliable strategies that are easily explained and understood.  
>>     Lets look briefly at one that should be useful.
>>
>>     Most nodes that can be transformed to an APT1 Hop Point, are either predominately
>>     consumers or producers of transport network data.  User driven machines are
>>     generally transport service consumers, little requests sent, big responses received,
>>     such as those seen in web browsing and streaming video services.  Machine driven
>>     machines, such as DNS, Web and Database servers, are generally network
>>     transport data producers, they receive little requests and send bigger responses.
>>     Even in Peer-to-Peer networks, you see stable consumer / producer roles emerge,
>>     where your node, the one you are paying for, generally becomes a network data
>>     producer, providing services to a lot of machines you don't know.   Peer-to-peer
>>     networks do present a challenge to this kind of strategy, but not always.
>>
>>     When a node is transformed to an APT1 Hop Point service provider, and the
>>     stepping stone function is active, a node will become both a consumer and
>>     a producer of the data it is transporting.  If it moves a large amount of data,
>>     as indicated in the Mandiant report, the overall producer / consumer properties
>>     of the affected node will move from where ever they are, toward 1.0, ...
>>     a balanced transport node.
>>
>>     Our job, is to try to identify when a node is transformed to an APT1 Hop Point,
>>     which means that it will go from whatever it was doing, to being a balanced
>>     producer / consumer, accepting data from an attack target, and relaying that
>>     data to the attacker.  If, historically, a node can be determined to be predominately
>>     a producer or consumer, then detecting when it becomes a large scale
>>     balanced producer / consumer, will be pretty easy, as the deviation from its
>>     normal behavior will be pretty dramatic.
>>
>>     Now, producer / consumer metrics are not a measure of the packets (rate)
>>     or total bytes (load) sent or received on the wire by a node.  Instead, its a
>>     measure of the transport bytes successfully received and sent.
>>
>>     Protocols like TCP generally present a balance of packets sent and received
>>     on the wire.  After the 3-packet TCP setup handshake, one side sends data,
>>     and the other sends TCP overhead ACKs, almost 1-for-1.  So we can't use
>>     packet counts to indicate producer / consumer roles on the network.
>>
>>     The total bytes on the wire has a bit more asymmetry that can reveal
>>     consumer and producer relationships,  but the noise generated by the TCP
>>     protocol overhead bytes can make the distinction a bit more difficult.
>>     There are tricks, such as PUSHing one byte at a time, through a TCP
>>     connection, or reducing the allowable window size on a connection, that
>>     can reduce the number of transported bytes per packet very small.
>>     Reporting the actually ACK'd data, rather than the total data on the wire,
>>     makes this type of analysis possible.
>>
>>     To measure the application bytes received or sent, argus needs to be configured
>>     to generate the metric. Set ARGUS_GENERATE_APPBYTE_METRIC=yes in your
>>     /etc/argus.conf file.  Lets assume that your argus is monitoring your enterprise
>>     border interface, so that you monitor all the traffic going in and out of your site.
>>     The resulting argus data will have the information needed to determine all the
>>     producers and consumers of your enterprise, i.e. those that are bringing data in
>>     and those that are transporting data out (this is a starting point for developing
>>     formal Transfer Function Models, by the way, when you get there).
>>
>>     A simple measure of the producer / consumer role is the ratio of application
>>     data sent (produced) vs the application data received (consumed).  Using argus
>>     data, you can calculate the metric on each status record, on each aggregated
>>     flow, or on any of the various aggregations that you can perform.  So its trivial to
>>     calculate the  "sappbytes / dappbytes" whether its a instantaneous microflow,
>>     or if its an entire subnet's traffic aggregated into a single argus flow record.
>>
>>     To start a simple analysis, lets process a days worth of data from a single QoSient
>>     workstation and see what's up.  Lets measure the sent and received application
>>     bytes of the top IP addresses seen, to assign simple producer and consumer roles,
>>     and try to use those labels as guides, to see what the trends are, and how
>>     to interpret the data. SPOILER: In this set of data, there are no APT1
>>     Hop Points, but....
>>
>>     Lets look at the IP addresses that the QoSient node 192.168.0.68 talked to, on
>>     April Fool's Day, 2013. This node resides in the 192.168.0.0/24 <http://192.168.0.0/24> network, and is
>>     a basic workstation, using shared file systems, with email, web browsing, automated
>>     software updates and cloud services.  What nodes does this node talk to, outside
>>     or inside our own network, and are they producers or consumers ?  
>>
>>     This node doesn't provide any services, so we expect all other nodes to be producers,
>>     not consumers, and we expect the node to be a consumer of network services.  Let's
>>     see, but to keep the email short, lets just look at the top 10 nodes.
>>
>>     Grabing an entire days worth of data from the collection archive, lets track
>>     individual IP addresses (so we'll use the " -M rmon " option), preserving the protocol
>>     and ports used, by each address.  We'll use this first pass derived data, as starting data
>>     for the actual analysis, which will we'll generate to report individual IP addresses total
>>     src application bytes and dst application bytes sent.  We'll take that data, and formulate
>>     the sappbytes/dappbytes ratio, by hand for this exercise, and if the ratio is > 1.5 then
>>     we'll label it as a Producer, if the ratio is < 0.95, we'll label it a Consumer, and between
>>     these numbers, we'll call the transport Balanced.  We'll color the output, so Consumers
>>     are in red, and Producers whose ratios are HUGE, we'll color blue.
>>
>>     Lets look at the top 10 SrcAppByte generators, to see how this might work.
>>     Here we go....
>>
>>
>>     thoth:01 carter$  racluster -R /archive/192.168.0.68/2013/04/01 <http://192.168.0.68/2013/04/01> -m saddr proto sport -w /tmp/argus.out - ipv4
>>     thoth:01 carter$ racluster -r /tmp/argus.out -m saddr -w - |  rasort -m sappbytes \
>>                            -s stime dur saddr proto sport sappbytes dappbytes -No10
>>                      StartTime        Dur            SrcAddr  Proto  Sport    SAppBytes    DAppBytes         Ratio
>>     2013/04/01.00:00:00.847207 86399.101*       192.168.0.66     ip            69805178      1339356       52.1185  Producer
>>     2013/04/01.15:54:08.964340  25.124109      208.59.201.94 <tel:208.59.201.94>    tcp http       27104415          120   225870.1250
>>     <tel:225870.1250>  Producer
>>     2013/04/01.00:01:16.133367 86285.734*        66.39.3.162    tcp imaps      12816471      1012491       12.6584  Producer
>>     2013/04/01.00:00:00.847207 86399.101*       192.168.0.68     ip            11872196    120391392        0.0986  Consumer
>>     2013/04/01.17:17:37.184721 528.364441       171.67.72.17    tcp ssh         4347072        50746       85.6633  Producer
>>     2013/04/01.00:02:51.660475 85447.757*      17.172.208.43    tcp https       2103142       430417        4.8863  Producer
>>     2013/04/01.15:55:58.919139 28921.785*       192.168.0.78     ip             1399179      7702570        0.1817  Consumer
>>     2013/04/01.09:47:16.282091 43205.253*        17.154.65.1    tcp https        472376        20531       23.0079  Producer
>>     2013/04/01.00:05:42.767984 85586.210*      192.168.0.127     ip              461937            0           Inf  Producer
>>     2013/04/01.00:29:54.738518 81412.937*      173.194.43.33    tcp *            413487        18616       22.2114  Producer
>>
>>
>>     Basically, what this data is saying, of the top 10 addresses sending data on April Fool's day,
>>     most are producers, just as we expected.  And the workstation itself, 192.168.0.68, is 
>>     a consumer (first line in red), with a sent/recv'd ratio of 0.0986.  We've got some really
>>     HUGE producers, which indicates purely one-way transfers, the kind we're looking for.
>>
>>     In this data we're looking for ATP1 relay data candidates. Large data transfers from a
>>     remote site to an internal node, that is then relayed to another external node, possibly
>>     Chinese, possibly not, in real time.
>>
>>     None of the producers are sending enough data to represent a LARGE exfiltration of data,
>>     one of the definitions of being an APT1 Hop Point.  But LARGE is a relative term, so we
>>     need to analyze any potential APT1 traffic candidate.
>>
>>     From the first remote address in the list, 208.59.201.94 <tel:208.59.201.94>, our largest remote producer, we
>>     received 27MB of data.  The sent / recv'd ratio of 225,870 is just what you would expect
>>     from a large transfer of data into your infrastructure, and is a good candidate for APT1
>>     style stepping stone data influx.  Even though its using HTTP as the protocol, we should
>>     assume the transport technique to be somewhat clever, so whether its HTTP, SSH, or
>>     a mix of protocols, is notable, but potentially insignificant.
>>
>>     For the purposes of this dialog, to identify this flow as a part of an APT1 Hop Point action,
>>     we need to find an outflow from our workstation that would transfer the data received from this
>>     remote node to another node.  In a simple APT1 Hop Point, our workstation would want to
>>     transfer the 27MB to a remote address, which we don't see in the top 10.  Whew !!!
>>     If its a simple relay, you would expect the outgoing flow to sort closely to the flow of
>>     interest, as they would both be transporting the same amount of application data.
>>
>>     In a slight variation of the basic APT1 Infrastructure, the Hop Point may relay the exfiltrated
>>     data to another internal node.  Our simple report indicates that the workstation doesn't
>>     transmit 27MB to any single node, either external or internal.  
>>
>>     And in the most complex relay models that could be implemented, where multiple
>>     endpoints receive portions of the exfiltrated data, our node still does not look to be
>>     an APT1 Hop Point.  By looking at the entry for 192.168.0.68, our workstation, we see
>>     that we don't actually send 27MB of total data out of the node for the whole day !!!
>>
>>                      StartTime        Dur            SrcAddr  Proto  Sport    SAppBytes    DAppBytes         Ratio
>>     2013/04/01.00:00:00.847207 86399.101*       192.168.0.68     ip            11872196    120391392        0.0986  Consumer
>>
>>     As you can see, we only sent (SAppBytes) 11.8MB total, to all our transport endpoints
>>     combined, for all of April 1, 2013.  So our candidate 27MB flow, does not look to be relayed.
>>     Now in the original data, there are about 10K individual flows that may could be candidates,
>>     but the aggregate analysis generated only a few hundred candidate IP addresses.
>>
>>     An automated system would iterate through all potential candidate transfers and attempt to
>>     find candidate outflows that could support the relay concept.  That would be the most elegant
>>     of analytics, and not that expensive, if you have a good aggregation model and
>>     analytic framework.
>>
>>     Now, just looking at an arbitrary day, by itself, you can get some assurance that you aren't
>>     support an APT1 Hop Point type of relay service.  But the strength of argus based network
>>     activity auditing, is that you have historical data that can support the development of 
>>     hourly producer / consumer metrics for every IP address in your archive, which could
>>     be abstractly called a Transport Function Model for all the assets in your observable
>>     domain.
>>
>>     I have done this type of analytic for our workstation over the last 2 years, and this
>>     workstation has maintained the 0.09 sent vs received application byte ratio for
>>     almost every day.  It has never gone over 0.17.  So this would be a great candidate
>>     machine for this type of analysis.  While it may receive a lot of data from the outside,
>>     it doesn't transfer a lot of data.  And if it did, it would be very easy to know it.
>>
>>     So, all I need is the sent / recv'd ratio for all the end points in my enterprise, and
>>     if they have had stable ratios that are >> 10 or << 0.1, indicating that they are stable
>>     producers and/or consumers, then detecting a significant shift toward 1.0, a balanced
>>     consumer / producer role, is pretty easy.  If you think that the change is significant,
>>     then you can go through the original flow data, looking at sappbyte an dappbyte
>>     metrics to figure out what happened.  Your looking for new producer roles for your
>>     consumers and new consumer roles for your producers, that are contributing 
>>     to the ratio moving toward 1.0.
>>
>>     It's a system that can work for the majority of the nodes in your enterprise.  For
>>     the ones that it doesn't, there are more complex analytics that can be used, but
>>     enough for a single piece of email.
>>
>>     Reactions, opinions, attitude and flames welcome, 
>>
>>     Hope all is most excellent,
>>
>>     Carter
>>
>>
>>
>>     On Mar 27, 2013, at 12:09 PM, Carter Bullard <carter at qosient.com <mailto:carter at qosient.com>> wrote:
>>
>>>     Gentle people,
>>>     To continue on the Argus and APT1 discussion, I had written that the Mandiant
>>>     APT1 document described two basic nodes in the APT1 attack infrastructure,
>>>     the Hop Point and the Target Attack Nodes.  I'm going to continue to write about
>>>     Hop Points for a few more emails, because, having one of your machines acting
>>>     as an APT1 Hop Point, is possible the worst thing that can happen in the APT1
>>>     attack life cycle.
>>>
>>>     So far, I've presented that Mandiant's report gives us a lot of detail, trends and
>>>     methods, that allow us to detect overt APT1 behavior using the argus data.  Trends
>>>     such as APT1's establishment and use of well defined attack infrastructure and
>>>     the tendancy to access that infrastructure directly, from well defined IP address
>>>     blocks, using specific access methods, and a good description of the attackers
>>>     intent, exfiltration of large amounts of data.  These trends lead to a set of very
>>>     simple tests for APT1 activity, that can be tested against argus data archives
>>>     to help you realize if you've been had, or not. 
>>>
>>>     The APT1 strategies that Mandiant describes are conventional, and the attack
>>>     infrastructure itself is simple, direct, almost optimal (minimal reliable methods,
>>>     2-3 hops from attacker to target), suggesting that the infrastructure has
>>>     predictable utility, i.e. it may actually work to scale, and work well enough to
>>>     be worth the effort.  The ultimate simplicity of the realized APT1 infrastructure,
>>>     may be the result of a limit in Mandiant's detection capability ( you can only
>>>     see what you are looking for ), but there is no question that what they describe
>>>     is real.
>>>
>>>     While Mandiant is very detailed in what it does talk about, there are huge
>>>     gaps in what it doesn't talk about.  I'd like to dive deeper into APT1 Hop Point
>>>     identification, but we're lacking key information.  What kind of systems does
>>>     APT1 use for Hop Points? Linux workstations ? Windows XP machines ? 
>>>     Web Servers ?  Android devices ?  Routers ?  While we have some really
>>>     great patterns to look for, like specific SSH certificates, there are so many
>>>     things we don't have; initial penetration techniques, command and control
>>>     methods, beaconing patterns, persistent vs dynamic access.
>>>
>>>     In the absence of real detail, we'll have to develop general strategies for
>>>     detection, and if we want to have any success, we'll need to avoid 
>>>     awareness / detection system pitfalls, such as sampling, and sampling bias
>>>     (looking only at one protocol or one type of OS), and matching complexity.
>>>
>>>     One of the simple characteristics that I will try to leverage in my discussions, is
>>>     the intent of the APT1 attack, and the goal of the APT1 Hop Point; to move
>>>     a lot of data, from a remote site to another remote site.  If that really is the
>>>     singular attack goal for APT1, then with good argus data generation and
>>>     analytics, we should be able to find any node that is acting as an APT1
>>>     Hop Point, as well as any the other APTx Hop Points that may exist.
>>>
>>>     The approach that I will try to describe in the next set of emails, is one based
>>>     on a Bell-LaPadula style of analysis, to find nodes that have been transformed
>>>     from being one type of network based node, to another type of network node, 
>>>     in the case of APT1, one that is supporting a demanding network based transport
>>>     service.
>>>
>>>     I'm going to use Time Series Analysis methods, specifically Transfer Function
>>>     Models, and Intervention Analysis to realize that a node is doing something
>>>     different.  The Transfer Function Models, are perfect for this, as they are
>>>     generally used to describe input / output dynamic system response,
>>>     and Intervention Analysis is all about the notion that there is an event that
>>>     motivates a dynamic change in system input / output.  So I'm going to try to use
>>>     this strategy to identify a change in input / output, and then to try to find the
>>>     event that correlates with the change.
>>>
>>>     If you can imagine that there is an argus running on every node in an
>>>     infrastructure, establishing a generalize network activity audit, that goes
>>>     back quite a ways, then we should have a very rich set of data to perform
>>>     this type of analysis, either automated, or by hand.   The goal will be to
>>>     realize that a node went from being a specific type of producer / consumer,
>>>     to a different kind of producer / consumer, over some period of time.
>>>
>>>     OK, that is going to be my strategy, any other approaches that seem to be
>>>     appropriate?    More to come.
>>>
>>>     Hope all is most excellent,
>>>
>>>     Carter
>>>
>>
>>
>>
>>
>> -- 
>> Jesse Bowling
>>



More information about the argus mailing list