Argus detecting historical APT1 activity #2

Sat Mar 23 14:06:12 EDT 2013

Gentle people,
To continue on the Argus and APT1 discussion, I had written that the Mandiant
APT1 document described two basic nodes in the APT1 attack infrastructure,
the Hop Point and the Target Attack Nodes.  I'm going to continue to write about
Hop Points for a few more emails, because, having one of your machines acting
as an APT1 Hop Point, is possible the worst thing that can happen in the APT1
attack life cycle.

The Hop Point is the primary attack node, from the perspective of the site
being attacked.  If APT1 were to obtain US Classified Information through one
of your corporate machines, regardless of who is driving the attack, your
corporation is the attacking entity, and you have will have to assume the
complete liability for the incident, including cost and reputation impacts, unless
you can support attribution to the actual attacker.

I had mentioned last time, that there were behaviors that could be leveraged
to assess if you had been compromised by APT1 and that nodes in your
organization were being used as Hop Points.

> Based on Mandiant's experience, there are some common trends that
> the APT1 report suggests for Hop Points.
>    1) virtually all accesses to APT1 Hop Points are from a
>         select set of IP addresses in China
>    2) inital penetration and subsequent direct accesses to
>         Hop Points use Remote Desktop Application
>    3) many late stage accesses use SSH tunneling, with
>         specific credentials.
>    4) Hop points are used to move large volumes of data.

The #1 trend was that most, if not all, accesses to Hop Points were from Chinese
IP addresses, and in the previous email, I describe how you can use argus data
and argus client programs to determine if nodes in your organization are
communicating with machines in China.  Of course, interaction with Chinese
machines is not proof of APT1 activity, as I indicated, so care must be taken
when using general indicators, to strengthen the evidence before you start
dismantling your organization, looking for APT1 nodes.

Mandiant suggested that there would be retribution against it for the disclosure of its
discoveries.  However, based on my experience investigating large scale (global)
intrusions at the CMU-CERT in the 1990's, this is very unlikely.  Active retribution
will strengthen the report, and reveal that the investigation has had a negative
impact on the attackers overall strategy.  What will most likely happen, is that
the adversary will change its methods, rather than stop its activity.   "Was", the
past tense of "is",  is now the correct tense to use, when referring to Mandiant's
description of APT1's methods.  

In this email, I'd like to talk about how to detect APT1 Hop Point activity, when the
accesses are not from China, and work our way to a point where you can use argus
data to detect the activity.  I'll try to do this, without this email growing to 100 pages.

The key is to try to understand what does a Hop Point do in its role in the APT1
infrastructure, and then to try to confidently detect that behavior.  The Mandiant
report indicated 3 other trends for Hop Points.  1) Access was done through Microsoft's
Remote Desktop Protocol.  Detecting that RDP was used at any time to access
nodes within your infrastructure, will be a telling point.  2) Many accesses are
tunneled through SSH, so detection of SSH tunnels to nodes outside your normal
Community of Interest (COI) will help to strengthen a case for APT1 activity, and
3) Hop Points are used to move large volumes of data.  Detecting that nodes
are using any method to move lots of data out of your infrastructure, will be our
primary key.

This last trend, that APT1 Hop Points move lots of data out, is very important, and
deserves a great deal of discussion, which we can't do today.  Hopefully, it will
suffice to say that Hop Point data movement should not be called Exfiltration,
but simple Transport.  The data the Hop Point is moving, doesn't originate from
the Hop Point's enterprise, but from someone else's enterprise.  So the data has
to come into the enterprise, and then leave.  This is critical to establishing a simple
detection criteria.  An APT1 Hop Point is really a transport node, either store and
forward, or real time transit.  If its a real time transit node, then its a single hop
Stepping Stone, which can be detected pretty easily.

Single hop Stepping Stones can be easy to find.  A single transport thread that
is going from X -> A and A -> Y, at the same time with multiple similar characteristics,
such as instantaneous load, application bytes transferred, packet dynamics.
You would like to think that content patterns would be a reliable method for matching,
like it is for tracking NAT processes, but it is may not be as good as you think.
If the two threads are using TCP, as an example, the two different connections
can have wildly different network experiences, such as variations in MTU, widow size,
loss rates etc... causing the data patterns to be more variable than expected.
But that doesn't make the effort impossible, just slightly complicated.

If you like to think of abstract objects and then find examples of them, a single
hop stepping stone is a proxyed connect that is terminated and re-originated.
In some cases, it is very similar to a NAT process, but because the connects are
terminated, the two threads of transport will have different sequence numbers,
TCP options, TTLs, framing points, and possibly different MTUs, all of which
can change the number of packets, and the packet sizes, but eventually they
are two TCP connections that are moving the same application payload, so
there will be somethings in common.

Argus clients can identify NAT'd traffic, where the IP addresses, and the ports
have been changed (to protect the innocent ?), because argus data collects
additional attributes that are not modified, like base sequence numbers, and
payload patterns.  The attributes of interest in stepping stone detection is
coincidental presence, with similar application loads.  And because Hop Points
move very large amounts of data, you should look for flows that stand out,
not ones that are hiding in the weeds.

As an example, taking your argus data collected from your enterprise border,
the goal is to find pairs of flows, that occur at the same time, with similar
appbytes transferred over that period.  rabins() can help a great deal here.
rabins() processes flow data, aggregating and sorting, within single time bins.
By aggregating records within specific time periods, using the default mode
of aggregation, and sorting based on appbytes, you will consolidate like flows
together in the output.

   rabins -F rabins.rarc -r argus.data.file -M time 1m -w -  - tcp

In the rabins.rarc, the RA_SORT_ALGORITHM would be set to "dappbytes sappbytes".
Of course, your argus sensors must be configured to provide this appbyte data.

For single hop Stepping Stones, you're looking for flows that are similar in app
byte demand where  X -> A and A -> Y.     In multi-hop stepping stones, the
similarity will be there, but it will be X -> A and B -> Y.  There is no indication
in the Mandiant report that APT1 was that cleaver, but you should not exclude
that.

When the Hop Point is a bit more sophisticated, the transport will not be real-time,
it will be store and forward.  What that means is that the Hop Point will collect the
data, wait for some period of time, such as an hour or a day, and then it will transfer
the collected data and send it to the mother ship.  If the transfer is limited to a
one-to-one transfer model, then completely aggregated argus data is needed to
find the transfer of the same file.  This is easy.

   racluster -r argus.data -w /tmp/argus.transaction.data

This will consolidate all the status records for a given TCP connection, into
a single record.  For 2 TCP connections that move the same file, the resulting
two TCP argus records will have identical SrcAppByte or DstAppByte values,
depending on whether the two TCP connections are push or pull.  So, take
the output of the general aggregation, and look for completed TCP connections
 that have equal sappbytes or dappbytes metrics.  These should represent
transport of the same files.

Not to ignore trends #1 and #2 from above....  Detection of RDP and SSH is really
pretty trivial with argus data and argus client programs.  You can filter for the well
known protocol port numbers, and you can inspect the user data captured in argus
data to detect these protocols on other ports.  Because RDP puts a bit of a network
demand to do its thing, its easy to find long lived high packet count sessions that
span an enterprise border, even in the presence of encryption. The average rate
and load needed to export displays, and to propagate mouse events is pretty
unique, so finding these is pretty easy.

RDP based connection attempts either from or to external nodes is trivial.

   ra -R /your/entire/argus/archive - \
         tcp and port 3389 and not src net ( .. the list of local network addresses...)

This should pick up scans for open RDP server ports, both failed and successful,
and transactions that are actually transferring data.   Variants of this filter may be
needed to test all the possibilities, but hopefully you can see how trivial this is.

OK, enough for a Saturday, please send comments / opinions / flames whatever
to the list.  If you think we should write clients that automate some of this discovery,
then do send your request to the list !!!!

Hope all is most excellent,

Carter

On Mar 20, 2013, at 12:17 PM, Carter Bullard <carter at qosient.com> wrote:

> Gentle people,
> The first general concept in the APT1 analysis that I find important, IMHO,
> is the distinction between APT1 infrastructure nodes and terminal target
> nodes, especially in terms of methods and access.
> 
> The style of infrastructure that Mandiant describes for APT1, is conventional,
> and expected in large scale exploitations.  Even as early as 1992, when I was
> one of the principal investigators at the CMU-CERT, this type of exploitation
> architecture was very common, and expected.  Bad guys spend a lot of resources
> establishing and maintaining a persistent 1 and possibly 2 hop virtual access
> network, from which they can do the " get in, get it and get out".  
> 
> The attack infrastructure is designed to provide a level of anonymity,
> attribution resistance, and protection from offensive retaliation, as well as
> act as a sensor to provide direct awareness of incident response actions (is
> there a reaction? are they on my trail ?) .
> 
> From a forensics perspective the complexity of the attack infrastructure is
> a measure of the sophistication of the effort, with high utility, but low complexity
> being the goals for optimality.
> 
> OK, Mandiant's APT1 report, describes a pretty interesting 2 layer attack
> infrastructure, with the attackers directly accessing " hop points ", which are
> then used to address and access potential attack targets.
> 
> A traditional Threat Assessment for APT1 should indicate that the defensive
> posture should be focused on avoiding being a " hop point " in the APT1
> infrastructure.  This conclusion would be based on a cost assessment:
> 1) financial liability due to neglegence, 2) cost of culpability defense, 3) cost
> of recovery, and 3) loss of reputation issues.  So, I'm interested in finding out
> if any of my hosts / nodes / whatever are acting as APT1 Hop Points.
> 
> Based on Mandiant's experience, there are some common trends that
> the APT1 report suggests for Hop Points.
>    1) virtually all accesses to APT1 Hop Points are from a
>         select set of IP addresses in China
>    2) inital penetration and subsequent direct accesses to
>         Hop Points use Remote Desktop Application
>    3) many late stage accesses use SSH tunneling, with
>         specific credentials.
>    4) Hop points are used to move large volumes of data.
> 
> Each of these trends can be tested using argus data that has already
> been collected, to give one some assurance that either you are or aren't
> an APT1 Hop Point, and if you are, how long have you been had, so to
> speak.  In this section, lets use the first trend to test if our argus sensors
> have seen any potential APT1 hop point activity, by looking for the
> specific set of IP addresses, or generally, any user plane payload
> transfers to and/or from CN.
> 
> If you are using argus to establish a network activity audit, you can 
> simply scan your argus archives to see if there are any accesses from
> the IP blocks Mandiant provides in the report.  Testing for accesses
> from a large set of divergent IP addresses is best done using the utility
> rafilteraddr(), as the address tests needed for each record is pretty fast,
> much faster than the command line filters would be.
> 
> This would be a minimum Best Common Practice, at least for argus users.
> 
>    % raaddrfilter -f apt1.ipv4.addr -R /root/of/your/entire/argus/archive
> 
> The sample configuration file, apt1.ipv4.addr is now available in the
> argus-clients-3.0.7.7 distribution, as ./support/Config/apt1.ipv4.addr.
> This file currently is comprised of:
> 
> #
> #  APT1 Origin IPv4 Addresses
> #
> #  Net blocks corresponding to IP Addresses that APT1 
> #  used to access their hop points.
> #  
> #  Derived from Mandiant's " APT1 Exposing One of
> #  China's Cyber Espionage Units ".
> #
> 
> 223.166.0.0-223.167.255.255
> 58.246.00-58.247.255.255
> 112.64.0.0-112.65.255.255
> 139.226.0.0-139.227.255.255
> 114.80.0.0-114.95.255.255
> 
> Any bi-direcitonal hit would be suspect, but flows that are transferring
> application data should really get our attention.
> 
> Scanning an entire archive can take some time.  For me to search 3 years
> of argus data for the complete QoSient infrastructure, a few TeraBytes of
> argus records, with a single set of address queries took 8+ hours.   But if
> you are a sophisticated argus user, you should be running rasqlinsert(),
> to maintain a real-time IP address inventory.   This has been described on
> the web site, and the mailing list.
> 
> Searching this set of databases, using a single thread, takes me 18.0 seconds
> to go through 2 years of argus data, using a command like this:
> 
>    rasql -t -2y+2y -M time 1d -r mysql://user@dbhost/db/ipAddrs_%Y_%m_%d \
>         -w - - ip |  rafilteraddr -f apt1.ipv4.addrs
> 
> Remember, the daily IP address tables are created using:
> 
>    rasqlinsert -M time 1d -M rmon -m srcid saddr -S argus.data \ 
>           -w mysql://user@dbhost/db/ipAddrs_%Y_%m_%d -M cache - ip
> 
> For QoSient, in the last 2 years, using these techniques, we've had 28
> interactions with IP addresses from the APT1 IP address blocks (17.9
> seconds to run that query), and only 1 of those transactions transferred
> actual data.  We test for transferred bytes, using the "appbytes gt 0" filter:
> 
> thoth:~ carter$ time rasql -M time 1d -r mysql://root@localhost/ratop/etherHost_%Y_%m_%d -t -2y+2y -w - | \
>     rafilteraddr -f /tmp/apt1* -s stime dur:12 smac saddr spkts dpkts sappbytes dappbytes \
>       - src appbytes gt 0 and dst appbytes gt 0
> 
>                  StartTime          Dur             SrcMac            SrcAddr  SrcPkts  DstPkts    SAppBytes    DAppBytes 
> 2012/12/27.11:41:39.405112     0.381737  80:71:1f:3c:c3:88     114.80.162.157        1        1          223           53 
> 
> real	0m18.006s
> user	0m23.180s
> sys	0m1.026s
> 
> So I have one interaction, in December 2012, that looks like something I need to look at.
> Because all of my argus data is time indexed, looking up the complete set of primitive argus data
> for this interaction doesn't take any time.  I take the starting time, add a few seconds, , in this case it takes 0.095 seconds:
> 
> time rasql -t 2012/12/27.11:41:39+5s -w - | ra - host 114.80.162.157
>                  StartTime        Dur      Flgs  Proto            SrcAddr  Sport   Dir            DstAddr  Dport  SrcPkts  DstPkts     SrcBytes     DstBytes State 
> 2012/12/27.11:41:39.405112   0.381737  e           udp       192.168.0.66.61354    <->     114.80.162.157.domain        1        1           95          265   CON
> 
> real	0m0.095s
> user	0m0.055s
> sys	0m0.014s
> 
> 
> Hmmm, a DNS query from my interior DNS server, into the heart of evil itself?  Well, because it takes less than a second to do
> the query, lets see what the DNS query was all about....  Using radump .....
> 
> time rasql -t 2012/12/27.11:41:39+5s -w - | radump -vv -s suser:64 duser:128 - host 114.80.162.157                                                                
>   
>   s[64]="50283  [1au] A? cc00013.h.cnc.ccgslb.net. ar: . OPT UDPs"
>  d[128]="50283- q: A? cc00013.h.cnc.ccgslb.net. 0/5/6 ns: cnc.ccgslb.net. NS ns1.cnc.ccgslb.net., cnc.ccgslb.net. NS ns2.cnc.ccg"
> 
> real	0m0.086s
> user	0m0.055s
> sys	0m0.015s
> 
> OK, well that could be an issue, so why would the DNS server ask for this address?  Lets back up our query a few seconds, to see if we can find the source of this exchange.  Lets grab any transactions that started 10 minutes before this, looking for the string " ccgslb ", to see what is up?
> 
> thoth:~ carter$ time rasql -t 2012/12/27.11:32:35+10m -e ccgslb -s stime dur srcid saddr dir daddr dport pkts bytes
>                  StartTime        Dur              SrcId            SrcAddr   Dir            DstAddr  Dport  TotPkts   TotBytes 
> 2012/12/27.11:41:38.707223   0.319857       192.168.0.66       192.168.0.66   <->     121.196.255.77.domain        2        211
> 2012/12/27.11:41:39.027925   0.098657       192.168.0.66       192.168.0.66   <->       192.12.94.30.domain        2        843
> 2012/12/27.11:41:39.127181   0.277315       192.168.0.66       192.168.0.66   <->     60.217.232.120.domain        2        386
> 2012/12/27.11:41:39.405112   0.381737       192.168.0.66       192.168.0.66   <->     114.80.162.157.domain        2        360
> 2012/12/27.11:41:39.668681   0.636537       192.168.0.66       192.168.0.68   <->       192.168.0.66.domain        2        547
> 2012/12/27.11:41:39.787594   0.516914       192.168.0.66       192.168.0.66   <->    221.192.148.100.domain        2        520
> 
> real	0m0.090s
> user	0m0.054s
> sys	0m0.008s
> 
> OK, so my DNS server, which is a recursive DNS server, is looking for something, that eventually returns
> what appears to be a Chinese address, so lets look at a these queries using radump():
> 
> thoth:~ carter$ time rasql -t 2012/12/27.11:41:35+30s -e ccgslb -s stime suser:64 duser:128 
> 2012/12/27.11:41:38.707223 s[45]=2"...........iphone.tgbus.com.......)........                      
> d[82]=2"...........iphone.tgbus.com..................mobile.tgbus.ccgslb.net...)........
> 
> 2012/12/27.11:41:39.027925 s[52]=!............mobile.tgbus.ccgslb.net.......)........                
> d[128]=!............mobile.tgbus.ccgslb.net..................ns1...............ns6...............ns2...............ns7...............ns
> 
> 2012/12/27.11:41:39.127181 s[52]=Vq...........mobile.tgbus.ccgslb.net.......)........               
> d[128]=Vq...........mobile.tgbus.ccgslb.net..................cc00013.h.cnc...?......Q....ns1.?.?......Q....ns2.?.?......Q....ns6.?.?...
> 
> 2012/12/27.11:41:39.405112 s[53]=.k...........cc00013.h.cnc.ccgslb.net.......)........              
> d[128]=.k...........cc00013.h.cnc.ccgslb.net.............Q....ns1..........Q....ns2..........Q....ns6..........Q....ns7..........Q....n
> 
> 2012/12/27.11:41:39.668681 s[34]=(............iphone.tgbus.com.....
> d[128]=(............iphone.tgbus.com..................mobile.tgbus.ccgslb.net..............cc00013.h.cnc.;.S.......x..q.a=.S.......x..z
> 
> 2012/12/27.11:41:39.787594 s[53]=.n...........cc00013.h.cnc.ccgslb.net.......)........       
> d[128]=.n...........cc00013.h.cnc.ccgslb.net..............x..z............x...............x.....o.........x...............x............
> 
> real	0m0.087s
> user	0m0.053s
> sys	0m0.007s
> 
> 
> It appears, that two machines at QoSient World Headquarters, got interested in " iphone.tgbus.com ", which is
> a Chinese domain, and a Chinese DNS server returned that the best DNS server to resolve that address
> is in an APT1 address block.  Because there wasn't any additional traffic, over a period of 2 years,
> we have to conclude that its not an issue.
> 
> So QoSient, LLC, based on this simple set of argus queries, hasn't been an active part of the Mandiant
> described APT1 attack infrastructure.   Total computational time to test this over the last 2 years,
> less than 20 seconds.
> 
> But that simple set of queries doesn't provide a complete answer.  DNS can be used by a really clever
> group as a command and control beacon, to advertise the availability of a node.  Was the DNS query
> for " iphone.tgbus.com " that eventually sent a DNS query into an APT1 address block, really innocuous ?
> In this case, yes, but I'll talk about how to come to that conclusion in more detail later.
> 
> Next we'll talk about items #2, 3 and 4, and how argus data can be used to look for
> trends as behavioral markers.
> 
> Hope you find this useful,
> 
> Carter
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130323/7a4690ed/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130323/7a4690ed/attachment.bin>