radark program design - cont.

Fri Oct 26 14:55:39 EDT 2007

Gentle people,
Here are some more ramblings regarding radark(), a simple scan detector
based on dark address space access detection.  While I'm writing up  
these
notes, I'm also changing the code, so if you see me reference a line
that is not in your version of radark(), don't fret.  I will try to  
have the
newest copy of radark in the development directory on the server.
Currently I'm working with:

    ftp://qosient/com/dev/argus-3.0/radark.rc.2.pl

Notes on dark address space determination.

There are two fundamental phases in radark() operation:
    1) dark address space determination
    2) tracking and reporting on activity of IP addresses that "touch"
         the dark address space

Its broken down like this for a number of reasons, but one good
reason is that realizing what part of the local address space is
dark, is not trivial.  The dark address space is not just the local
address space that's not allocated or turned on.  It also includes
network and host addresses that are not reachable because
of access control policies, which functionally extend the
dark address space.

And the problem is also a bit more complicated, because there are
selective access control policies.  A is allowed to talk to B, but C is
not.  So B is in C's dark address space.  And of course, since there
are time-based access control policies, you have to worry about the
scenario where B is sometimes in A's dark address space.

With this level of complexity, its not a wonder that understanding
your dark address space maybe a complicated thing.

If you as the administrator, are an all knowing being, then you can
compile a description of the dark address space, but most people
would like to have some help in doing that.

I think we can do this pretty easily in most educational or government
enterprise networks, or at least get a grip on the problem easily,  
because
there is an enormous amount of scanning constantly going on.   So
the dark address space is being tested pretty much constantly.
Now for some sites the rate of discovery is not very high, but most of
the time, you get something coming in and blasting the dark address
space a few times a day.

Noe, in a perfect IP world, the network equipment will tell you that it
can't forward traffic to an intended destination address, by generating
ICMP Unreachable events.   Of course,  not all devices do a good job
at this, and some devices are configured to not generate ICMP
unreachables, but for most sites there is enough of it going on that
this strategy is very useful.  And of course, not all ICMP unreachable's
indicate access of a dark IP address, so we need to pick and choose
which events to track.

The list of events that are usable for this purpose are:
    UNREACH_NET, UNREACH_HOST, UNREACH_SRCFAIL,
    UNREACH_NET_UNKNOWN, UNREACH_HOST_UNKNOWN,
    UNREACH_ISOLATED, UNREACH_HOST_PROHIB,
    UNREACH_FILTER_PROHIB.

Because argus maps ICMP packets to the flows that cause them,
you can find these events in the argus stream. All ra* programs can
filter for 'unreach' events, so this is pretty easy to do.  We do have
filters for some unreachable, but not all (as of today), so in this
example, I'm just going to grab them all, and show how we can
filter them down to get what we want.

After radark() conditions the data, it splits the output to get the
"unreachables" into a separate file, so we can process it a
couple of times.

    `racluster -w -  @arglist | ra -E $RADATA/racluster.out -w  
$RADATA/raunreach.out - unreach`;

The resulting file, $RADATA/raunreach.out contains all the data that
encountered unreachable ICMP events, and the ICMP events as well.

There are two things you can do with the file.  The first is, you can  
get the
list of internal IP addresses that are definitely unreachable from where
this argus is monitoring.  This will help to give you a great list of  
dark space
addresses (not the complete list, but a verifiable list of  
unreachable IP's).

    racluster -m daddr -r -r $RADATA/raunreach.out - \
            \(not icmp\) and \(not src net $localaddr and dst net  
$localaddr\)`;

The file raunreach.out contains the flows that had ICMP Unreachables  
mapped
to them, as well as the actual ICMP flows themselves.  So you need to
filter out the native icmp data ("not icmp"), and because you are  
really only
interested in the flows that originated from the "outside", and  
targeted something
on the "inside", we add "not src net $localaddr and dst net $localaddr".

If you wanted to build a history of all the unreachable addresses you  
could
write this out to a file, like this:

  racluster -m daddr -r $RADATA/raunreach.out -w darkaddresses.out - \
            \(not icmp\) and \(not src net $localaddr and dst net  
$localaddr\)`;

But if you wanted to build a persistent database of proven unreachable
addresses (based on active ICMP messaging)  that you want to grow
over time, you would do something like this.

Lets say you want a binary darkaddress.out file that tracks dark  
addresses
over time.  The idea is to periodically process new argus data, say  
every
hour or every day, and add the new addresses to the persistent database.
In this case the database is just a file.  With racluster(), you can  
do this in
two steps.   First, you have racluster() read the old darkaddress.out  
file
to prime it.  With that data in its cache, you have racluster() open  
the new
data file, and process the data, which automatically generates the new
dark address space data. You have racluster() write this new data to a
temporary file, and then replace the old with the new.

    racluster -m daddr -r darkaddress.out $RADATA/raunreach.out -w  
temp.out - \
            \(not icmp\) and \(not src net $localaddr and dst net  
$localaddr\)`;
    mv temp.out darkaddress.out

By putting the multiple files on the command line, racluster() will read
them in sequentially, in the order they appear (this is very important
for some operations, as one file is the cache, then other is the new  
data).

So, you as the administrator will want to keep an eye on this file to  
make
sure that it reflects the real dark address space, but you'll find  
that it
does a very good job.  This persistent list of real dark addresses can
be very useful to improve radark() performance.  We'll add this later to
the actual scripts of radark(), in another installation.

The second thing you can do with this raunreach.out data, is to build a
strong list of scanners, i.e. the external addresses that  access  
multiple
unreachable IP addresses.  To do this you first get the list of "A ->  
B"'s
that had unreachable events, and then aggregate that list to get
the "A ->" records.  Then a simple filter is all you need.

So, an example would be this command:

    `racluster -M norep -m saddr daddr -r $RADATA/raunreach.out -- \
            \(not icmp\) and \(not src net $localaddr and dst net  
$localaddr\)`;

What are we doing here?   The "-M norep" option means we aren't
going to generate any aggregate statistics yet on this data; "no  
report".
Hopefully this will become clear in the next step.  We aggregate  
using the
"-m saddr daddr" key, which gives us all the IP matrix data, with the  
direction
semantics preserved ("A -> B").

So, this command gives us the "A -> B" flows that were unreachable.   
Now, if we
aggregate the output using:

    racluster -m saddr

We'll get record's for each external IP address that accessed an  
unreachable
interior address.   By doing it this way, we get in each record an  
aggregation
statistic that will tell us how many addresses did the external  
addresses
attempted to access that were unreachable.  With this metric in  
place, we can
now get an answer to the question, "What external IP addresses attempted
to access X unreachable internal addresses?"    Isn't that a  
reasonable question?
So putting it all together:

    `racluster -M norep -m saddr daddr -r $RADATA/raunreach.out -w -   
--\
            \(not icmp\) and \(not src net $localaddr and dst net  
$localaddr\) \
     racluster -m saddr -w radarkaddress.out`;

This set of commands is in the radark.pl program, and we'll build on  
this
as we go!!!

Now to look at the output, use something like this:

     rasort -m trans -s stime dur trans saddr pkts bytes state \
           -r $RADATA/darkaddress.out -- trans gt 100`;

Here we're picking addresses that accessed more than 100 internal
unreachable addresses (I only used this so the output wouldn't be too
long).  A good number maybe 2-5, depending on how you want to track
this stuff.

Against some old data I have, this is the output:

        Dur  Trans            SrcAddr  TotPkts   TotBytes State
210.420624   1599       211.222.8.44     3997     247814   INT
2494.95947    340      218.104.78.69      390      23400   INT
302.197723    221     202.99.219.206      222      13320   INT
2511.13232    147        210.51.1.47      154       9240   INT
2532.34838    112      218.22.14.104      116       6960   INT

In this case the "trans" field (number of transactions) represents  
the number
of destination addresses that this IP address accessed.  So the  
higher, the
more confidence we should have that this IP address is scanning our  
address
space.

Hopefully you will find this useful, and if so, I'll continue the  
dialog.

Carter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20071026/44eb4dc9/attachment.html>