Argus detecting historical APT1 activity #3 cont

Tue Apr 30 22:44:25 EDT 2013

Hi Carter,

I've been working through this example; this is a very interesting approach
in that you're boiling host network patterns into a single number that you
can watch over time to indicate a change in the host...This sort of
distillation seems like a big win, once you're instrumented to track it...

On that subject, I had some difficulties while trying to blindly implement
the commands you gave and wanted to send back some notes and questions to
the list...

* The text states you need "-M rmon" in the first racluster, but the
example doesn't include it; I found it should be:

racluster -R argus_dir/ -M rmon -m saddr proto sport -w argus.out - 'ipv4'

* I found I could calculate the ratio of sappbytes/dappbytes (and create a
'label') using awk like:

awk '{if( $8 + 0 != 0) {LABEL="Balanced";RATIO=$7/$8; if ( RATIO > 1.5)
{LABEL="Producer"}; if (RATIO < 0.95) {LABEL="Consumer"}; print
$0,RATIO"\t"LABEL}}' ra_text_output_file

However my example is based on the fields in my rarc file, and thus this
method isn't very elegant...and will also miss any records that are missing
a field...It would seem that this metric would be easy to calculate with
the clients themselves and would give the added benefit of allowing for
ralabel'ing to be used on the metric (much more portable and useful I
think)...I think this is a feature request... :)

* I wanted to start iterating through various test cases on my data,
varying time ranges and networks that I examined. I found that I can get
very 'off' results based on how I try to filter which networks I want...for
instance:

This example will lead to hosts showing up multiple times in the final
output
# /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w
${TMP1} - 'ipv4 and *src net 10.10.10.0/24*'
#/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort
-r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes

This example will appears to be fine in the final output
# /usr/local/bin/racluster -r ${HOUR}* -M rmon -m saddr proto sport -w
${TMP1} - 'ipv4 and *net 10.10.10.0/24*'
#/usr/local/bin/racluster -r ${TMP1} -m saddr -w - | /usr/local/bin/rasort
-r - -m sappbytes -s stime dur saddr proto sport sappbytes dappbytes

I think I have a misunderstanding about how racluster and filters interact;
can you explain why the 'src' part in the first example would cause
multiple entries for individual hosts in the final output?

Thank you for sharing your knowledge and experience to this community!

Cheers,

Jesse

On Tue, Apr 2, 2013 at 5:09 PM, Carter Bullard <carter at qosient.com> wrote:

> Gentle people,
> To continue on the Argus and APT1 discussion, I had written that the
> Mandiant
> APT1 document described two basic nodes in the APT1 attack infrastructure,
> the Hop Point and the Target Attack Nodes.  I'm going to continue to write
> about
> Hop Points for a few more emails, because, having one of your machines
> acting
> as an APT1 Hop Point, is possibly the worst thing that can happen to you
> in the
> APT1 attack life cycle.
>
> I suggested that the best strategy for identifying APT1 Hop Points is to
> use Time
> Series Analysis methods, specifically Transfer Function Models, and
> Intervention
> Analysis, to realize that a node has been transformed, (Identification)
> and to
> realize who, what, when, how it was transformed (Attribution).  Now, I'm
> pretty
> sure that most people are not interested in a long discourse on how to use
> 2nd and 3rd order differentials over different time periods, to recognize
> trending discontinuities.  This stuff is pretty complicated, and advanced
> even
> for complex Time Series forecasting and control methods.  But that is the
> kind of direction you want to go in if you want to do Machine Learning
> methods,
> or if you want to do unsupervised systems for network behavioral anomaly
> detection, which would be a really cool thing to have.
>
> In support of this APT1 Hop Point identification process, however, there
> are more
> direct things you can look at, that don't take a lot of math, and can be
> done with
> simple, effective, reliable strategies that are easily explained
> and understood.
> Lets look briefly at one that should be useful.
>
> Most nodes that can be transformed to an APT1 Hop Point, are either
> predominately
> consumers or producers of transport network data.  User driven machines are
> generally transport service consumers, little requests sent, big responses
> received,
> such as those seen in web browsing and streaming video services.  Machine
> driven
> machines, such as DNS, Web and Database servers, are generally network
> transport data producers, they receive little requests and send bigger
> responses.
> Even in Peer-to-Peer networks, you see stable consumer / producer roles
> emerge,
> where your node, the one you are paying for, generally becomes a network
> data
> producer, providing services to a lot of machines you don't know.
> Peer-to-peer
> networks do present a challenge to this kind of strategy, but not always.
>
> When a node is transformed to an APT1 Hop Point service provider, and the
> stepping stone function is active, a node will become both a consumer and
> a producer of the data it is transporting.  If it moves a large amount of
> data,
> as indicated in the Mandiant report, the overall producer / consumer
> properties
> of the affected node will move from where ever they are, toward 1.0, ...
> a balanced transport node.
>
> Our job, is to try to identify when a node is transformed to an APT1 Hop
> Point,
> which means that it will go from whatever it was doing, to being a balanced
> producer / consumer, accepting data from an attack target, and relaying
> that
> data to the attacker.  If, historically, a node can be determined to be
> predominately
> a producer or consumer, then detecting when it becomes a large scale
> balanced producer / consumer, will be pretty easy, as the deviation from
> its
> normal behavior will be pretty dramatic.
>
> Now, producer / consumer metrics are not a measure of the packets (rate)
> or total bytes (load) sent or received on the wire by a node.  Instead,
> its a
> measure of the transport bytes successfully received and sent.
>
> Protocols like TCP generally present a balance of packets sent and received
> on the wire.  After the 3-packet TCP setup handshake, one side sends data,
> and the other sends TCP overhead ACKs, almost 1-for-1.  So we can't use
> packet counts to indicate producer / consumer roles on the network.
>
> The total bytes on the wire has a bit more asymmetry that can reveal
> consumer and producer relationships,  but the noise generated by the TCP
> protocol overhead bytes can make the distinction a bit more difficult.
> There are tricks, such as PUSHing one byte at a time, through a TCP
> connection, or reducing the allowable window size on a connection, that
> can reduce the number of transported bytes per packet very small.
> Reporting the actually ACK'd data, rather than the total data on the wire,
> makes this type of analysis possible.
>
> To measure the application bytes received or sent, argus needs to be
> configured
> to generate the metric. Set ARGUS_GENERATE_APPBYTE_METRIC=yes in your
> /etc/argus.conf file.  Lets assume that your argus is monitoring your
> enterprise
> border interface, so that you monitor all the traffic going in and out of
> your site.
> The resulting argus data will have the information needed to determine all
> the
> producers and consumers of your enterprise, i.e. those that are bringing
> data in
> and those that are transporting data out (this is a starting point for
> developing
> formal Transfer Function Models, by the way, when you get there).
>
> A simple measure of the producer / consumer role is the ratio of
> application
> data sent (produced) vs the application data received (consumed).  Using
> argus
> data, you can calculate the metric on each status record, on each
> aggregated
> flow, or on any of the various aggregations that you can perform.  So its
> trivial to
> calculate the  "sappbytes / dappbytes" whether its a instantaneous
> microflow,
> or if its an entire subnet's traffic aggregated into a single argus flow
> record.
>
> To start a simple analysis, lets process a days worth of data from a
> single QoSient
> workstation and see what's up.  Lets measure the sent and received
> application
> bytes of the top IP addresses seen, to assign simple producer and consumer
> roles,
> and try to use those labels as guides, to see what the trends are, and how
> to interpret the data. SPOILER: In this set of data, there are no APT1
> Hop Points, but....
>
> Lets look at the IP addresses that the QoSient node 192.168.0.68 talked
> to, on
> April Fool's Day, 2013. This node resides in the 192.168.0.0/24network, and is
> a basic workstation, using shared file systems, with email, web
> browsing, automated
> software updates and cloud services.  What nodes does this node talk
> to, outside
> or inside our own network, and are they producers or consumers ?
>
> This node doesn't provide any services, so we expect all other nodes to be
> producers,
> not consumers, and we expect the node to be a consumer of network
> services.  Let's
> see, but to keep the email short, lets just look at the top 10 nodes.
>
> Grabing an entire days worth of data from the collection archive, lets
> track
> individual IP addresses (so we'll use the " -M rmon " option), preserving
> the protocol
> and ports used, by each address.  We'll use this first pass derived data,
> as starting data
> for the actual analysis, which will we'll generate to report individual IP
> addresses total
> src application bytes and dst application bytes sent.  We'll take that
> data, and formulate
> the sappbytes/dappbytes ratio, by hand for this exercise, and if the ratio
> is > 1.5 then
> we'll label it as a Producer, if the ratio is < 0.95, we'll label it
> a Consumer, and between
> these numbers, we'll call the transport Balanced.  We'll color the output,
> so Consumers
> are in red, and Producers whose ratios are HUGE, we'll color blue.
>
> Lets look at the top 10 SrcAppByte generators, to see how this might work.
> Here we go....
>
>
> thoth:01 carter$  racluster -R /archive/192.168.0.68/2013/04/01 -m saddr
> proto sport -w /tmp/argus.out - ipv4
> thoth:01 carter$ racluster -r /tmp/argus.out -m saddr -w - |  rasort -m
> sappbytes \
>                        -s stime dur saddr proto sport sappbytes dappbytes
> -No10
>                  StartTime        Dur            SrcAddr  Proto  Sport
>  SAppBytes    DAppBytes         Ratio
> 2013/04/01.00:00:00.847207 86399.101*       192.168.0.66     ip
>  69805178      1339356       52.1185  Producer
> 2013/04/01.15:54:08.964340  25.124109      208.59.201.94    tcp http
>   27104415          120   225870.1250  Producer
> 2013/04/01.00:01:16.133367 86285.734*        66.39.3.162    tcp imaps
>  12816471      1012491       12.6584  Producer
> 2013/04/01.00:00:00.847207 86399.101*       192.168.0.68     ip
>  11872196    120391392        0.0986  Consumer
> 2013/04/01.17:17:37.184721 528.364441       171.67.72.17    tcp ssh
>   4347072        50746       85.6633  Producer
> 2013/04/01.00:02:51.660475 85447.757*      17.172.208.43    tcp https
>   2103142       430417        4.8863  Producer
> 2013/04/01.15:55:58.919139 28921.785*       192.168.0.78     ip
>   1399179      7702570        0.1817  Consumer
> 2013/04/01.09:47:16.282091 43205.253*        17.154.65.1    tcp https
>    472376        20531       23.0079  Producer
> 2013/04/01.00:05:42.767984 85586.210*      192.168.0.127     ip
>    461937            0           Inf  Producer
> 2013/04/01.00:29:54.738518 81412.937*      173.194.43.33    tcp *
>    413487        18616       22.2114  Producer
>
>
> Basically, what this data is saying, of the top 10 addresses sending data
> on April Fool's day,
> most are producers, just as we expected.  And the workstation itself,
> 192.168.0.68, is
> a consumer (first line in red), with a sent/recv'd ratio of 0.0986.  We've
> got some really
> HUGE producers, which indicates purely one-way transfers, the kind we're
> looking for.
>
> In this data we're looking for ATP1 relay data candidates. Large data
> transfers from a
> remote site to an internal node, that is then relayed to another external
> node, possibly
> Chinese, possibly not, in real time.
>
> None of the producers are sending enough data to represent a LARGE
> exfiltration of data,
> one of the definitions of being an APT1 Hop Point.  But LARGE is a
> relative term, so we
> need to analyze any potential APT1 traffic candidate.
>
> From the first remote address in the list, 208.59.201.94, our largest
> remote producer, we
> received 27MB of data.  The sent / recv'd ratio of 225,870 is just what
> you would expect
> from a large transfer of data into your infrastructure, and is a good
> candidate for APT1
> style stepping stone data influx.  Even though its using HTTP as the
> protocol, we should
> assume the transport technique to be somewhat clever, so whether its HTTP,
> SSH, or
> a mix of protocols, is notable, but potentially insignificant.
>
> For the purposes of this dialog, to identify this flow as a part of an
> APT1 Hop Point action,
> we need to find an outflow from our workstation that would transfer the
> data received from this
> remote node to another node.  In a simple APT1 Hop Point,
> our workstation would want to
> transfer the 27MB to a remote address, which we don't see in the top 10.
>  Whew !!!
> If its a simple relay, you would expect the outgoing flow to sort closely
> to the flow of
> interest, as they would both be transporting the same amount of
> application data.
>
> In a slight variation of the basic APT1 Infrastructure, the Hop Point may
> relay the exfiltrated
> data to another internal node.  Our simple report indicates that the
> workstation doesn't
> transmit 27MB to any single node, either external or internal.
>
> And in the most complex relay models that could be implemented, where
> multiple
> endpoints receive portions of the exfiltrated data, our node still does
> not look to be
> an APT1 Hop Point.  By looking at the entry for 192.168.0.68, our
> workstation, we see
> that we don't actually send 27MB of total data out of the node for the
> whole day !!!
>
>                  StartTime        Dur            SrcAddr  Proto  Sport
>  SAppBytes    DAppBytes         Ratio
> 2013/04/01.00:00:00.847207 86399.101*       192.168.0.68     ip
>  11872196    120391392        0.0986  Consumer
>
> As you can see, we only sent (SAppBytes) 11.8MB total, to all our
> transport endpoints
> combined, for all of April 1, 2013.  So our candidate 27MB flow, does not
> look to be relayed.
> Now in the original data, there are about 10K individual flows that may
> could be candidates,
> but the aggregate analysis generated only a few hundred candidate IP
> addresses.
>
> An automated system would iterate through all potential candidate
> transfers and attempt to
> find candidate outflows that could support the relay concept.  That would
> be the most elegant
> of analytics, and not that expensive, if you have a good aggregation model
> and
> analytic framework.
>
> Now, just looking at an arbitrary day, by itself, you can get some
> assurance that you aren't
> support an APT1 Hop Point type of relay service.  But the strength of
> argus based network
> activity auditing, is that you have historical data that can support the
> development of
> hourly producer / consumer metrics for every IP address in your archive,
> which could
> be abstractly called a Transport Function Model for all the assets in your
> observable
> domain.
>
> I have done this type of analytic for our workstation over the last 2
> years, and this
> workstation has maintained the 0.09 sent vs received application
> byte ratio for
> almost every day.  It has never gone over 0.17.  So this would be a great
> candidate
> machine for this type of analysis.  While it may receive a lot of data
> from the outside,
> it doesn't transfer a lot of data.  And if it did, it would be very easy
> to know it.
>
> So, all I need is the sent / recv'd ratio for all the end points in my
> enterprise, and
> if they have had stable ratios that are >> 10 or << 0.1, indicating that
> they are stable
> producers and/or consumers, then detecting a significant shift toward 1.0,
> a balanced
> consumer / producer role, is pretty easy.  If you think that the change is
> significant,
> then you can go through the original flow data, looking at sappbyte an
> dappbyte
> metrics to figure out what happened.  Your looking for new producer roles
> for your
> consumers and new consumer roles for your producers, that are contributing
> to the ratio moving toward 1.0.
>
> It's a system that can work for the majority of the nodes in your
> enterprise.  For
> the ones that it doesn't, there are more complex analytics that can be
> used, but
> enough for a single piece of email.
>
> Reactions, opinions, attitude and flames welcome,
>
> Hope all is most excellent,
>
> Carter
>
>
>
> On Mar 27, 2013, at 12:09 PM, Carter Bullard <carter at qosient.com> wrote:
>
> Gentle people,
> To continue on the Argus and APT1 discussion, I had written that the
> Mandiant
> APT1 document described two basic nodes in the APT1 attack infrastructure,
> the Hop Point and the Target Attack Nodes.  I'm going to continue to write
> about
> Hop Points for a few more emails, because, having one of your machines
> acting
> as an APT1 Hop Point, is possible the worst thing that can happen in the
> APT1
> attack life cycle.
>
> So far, I've presented that Mandiant's report gives us a lot of detail,
> trends and
> methods, that allow us to detect overt APT1 behavior using the argus data.
>  Trends
> such as APT1's establishment and use of well defined attack infrastructure
> and
> the tendancy to access that infrastructure directly, from well defined IP
> address
> blocks, using specific access methods, and a good description of the
> attackers
> intent, exfiltration of large amounts of data.  These trends lead to a set
> of very
> simple tests for APT1 activity, that can be tested against argus data
> archives
> to help you realize if you've been had, or not.
>
> The APT1 strategies that Mandiant describes are conventional, and the
> attack
> infrastructure itself is simple, direct, almost optimal (minimal reliable
> methods,
> 2-3 hops from attacker to target), suggesting that the infrastructure has
> predictable utility, i.e. it may actually work to scale, and work well
> enough to
> be worth the effort.  The ultimate simplicity of the realized APT1
> infrastructure,
> may be the result of a limit in Mandiant's detection capability ( you can
> only
> see what you are looking for ), but there is no question that what they
> describe
> is real.
>
> While Mandiant is very detailed in what it does talk about, there are huge
> gaps in what it doesn't talk about.  I'd like to dive deeper into APT1 Hop
> Point
> identification, but we're lacking key information.  What kind of systems
> does
> APT1 use for Hop Points? Linux workstations ? Windows XP machines ?
> Web Servers ?  Android devices ?  Routers ?  While we have some really
> great patterns to look for, like specific SSH certificates, there are so
> many
> things we don't have; initial penetration techniques, command and control
> methods, beaconing patterns, persistent vs dynamic access.
>
> In the absence of real detail, we'll have to develop general strategies for
> detection, and if we want to have any success, we'll need to avoid
> awareness / detection system pitfalls, such as sampling, and sampling bias
> (looking only at one protocol or one type of OS), and matching complexity.
>
> One of the simple characteristics that I will try to leverage in my
> discussions, is
> the intent of the APT1 attack, and the goal of the APT1 Hop Point; to move
> a lot of data, from a remote site to another remote site.  If that really
> is the
> singular attack goal for APT1, then with good argus data generation and
> analytics, we should be able to find any node that is acting as an APT1
> Hop Point, as well as any the other APTx Hop Points that may exist.
>
> The approach that I will try to describe in the next set of emails, is one
> based
> on a Bell-LaPadula style of analysis, to find nodes that have been
> transformed
> from being one type of network based node, to another type of network
> node,
> in the case of APT1, one that is supporting a demanding network based
> transport
> service.
>
> I'm going to use Time Series Analysis
> methods, specifically Transfer Function
> Models, and Intervention Analysis to realize that a node is doing something
> different.  The Transfer Function Models, are perfect for this, as they are
> generally used to describe input / output dynamic system response,
> and Intervention Analysis is all about the notion that there is an event
> that
> motivates a dynamic change in system input / output.  So I'm going to try
> to use
> this strategy to identify a change in input / output, and then to try to
> find the
> event that correlates with the change.
>
> If you can imagine that there is an argus running on every node in an
> infrastructure, establishing a generalize network activity audit, that goes
> back quite a ways, then we should have a very rich set of data to perform
> this type of analysis, either automated, or by hand.   The goal will be to
> realize that a node went from being a specific type of producer / consumer,
> to a different kind of producer / consumer, over some period of time.
>
> OK, that is going to be my strategy, any other approaches that seem to be
> appropriate?    More to come.
>
> Hope all is most excellent,
>
> Carter
>
>
>

-- 
Jesse Bowling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130430/f32d1976/attachment.html>