radark program design - cont.
Carter Bullard
carter at qosient.com
Fri Oct 26 14:55:39 EDT 2007
Gentle people,
Here are some more ramblings regarding radark(), a simple scan detector
based on dark address space access detection. While I'm writing up
these
notes, I'm also changing the code, so if you see me reference a line
that is not in your version of radark(), don't fret. I will try to
have the
newest copy of radark in the development directory on the server.
Currently I'm working with:
ftp://qosient/com/dev/argus-3.0/radark.rc.2.pl
Notes on dark address space determination.
There are two fundamental phases in radark() operation:
1) dark address space determination
2) tracking and reporting on activity of IP addresses that "touch"
the dark address space
Its broken down like this for a number of reasons, but one good
reason is that realizing what part of the local address space is
dark, is not trivial. The dark address space is not just the local
address space that's not allocated or turned on. It also includes
network and host addresses that are not reachable because
of access control policies, which functionally extend the
dark address space.
And the problem is also a bit more complicated, because there are
selective access control policies. A is allowed to talk to B, but C is
not. So B is in C's dark address space. And of course, since there
are time-based access control policies, you have to worry about the
scenario where B is sometimes in A's dark address space.
With this level of complexity, its not a wonder that understanding
your dark address space maybe a complicated thing.
If you as the administrator, are an all knowing being, then you can
compile a description of the dark address space, but most people
would like to have some help in doing that.
I think we can do this pretty easily in most educational or government
enterprise networks, or at least get a grip on the problem easily,
because
there is an enormous amount of scanning constantly going on. So
the dark address space is being tested pretty much constantly.
Now for some sites the rate of discovery is not very high, but most of
the time, you get something coming in and blasting the dark address
space a few times a day.
Noe, in a perfect IP world, the network equipment will tell you that it
can't forward traffic to an intended destination address, by generating
ICMP Unreachable events. Of course, not all devices do a good job
at this, and some devices are configured to not generate ICMP
unreachables, but for most sites there is enough of it going on that
this strategy is very useful. And of course, not all ICMP unreachable's
indicate access of a dark IP address, so we need to pick and choose
which events to track.
The list of events that are usable for this purpose are:
UNREACH_NET, UNREACH_HOST, UNREACH_SRCFAIL,
UNREACH_NET_UNKNOWN, UNREACH_HOST_UNKNOWN,
UNREACH_ISOLATED, UNREACH_HOST_PROHIB,
UNREACH_FILTER_PROHIB.
Because argus maps ICMP packets to the flows that cause them,
you can find these events in the argus stream. All ra* programs can
filter for 'unreach' events, so this is pretty easy to do. We do have
filters for some unreachable, but not all (as of today), so in this
example, I'm just going to grab them all, and show how we can
filter them down to get what we want.
After radark() conditions the data, it splits the output to get the
"unreachables" into a separate file, so we can process it a
couple of times.
`racluster -w - @arglist | ra -E $RADATA/racluster.out -w
$RADATA/raunreach.out - unreach`;
The resulting file, $RADATA/raunreach.out contains all the data that
encountered unreachable ICMP events, and the ICMP events as well.
There are two things you can do with the file. The first is, you can
get the
list of internal IP addresses that are definitely unreachable from where
this argus is monitoring. This will help to give you a great list of
dark space
addresses (not the complete list, but a verifiable list of
unreachable IP's).
racluster -m daddr -r -r $RADATA/raunreach.out - \
\(not icmp\) and \(not src net $localaddr and dst net
$localaddr\)`;
The file raunreach.out contains the flows that had ICMP Unreachables
mapped
to them, as well as the actual ICMP flows themselves. So you need to
filter out the native icmp data ("not icmp"), and because you are
really only
interested in the flows that originated from the "outside", and
targeted something
on the "inside", we add "not src net $localaddr and dst net $localaddr".
If you wanted to build a history of all the unreachable addresses you
could
write this out to a file, like this:
racluster -m daddr -r $RADATA/raunreach.out -w darkaddresses.out - \
\(not icmp\) and \(not src net $localaddr and dst net
$localaddr\)`;
But if you wanted to build a persistent database of proven unreachable
addresses (based on active ICMP messaging) that you want to grow
over time, you would do something like this.
Lets say you want a binary darkaddress.out file that tracks dark
addresses
over time. The idea is to periodically process new argus data, say
every
hour or every day, and add the new addresses to the persistent database.
In this case the database is just a file. With racluster(), you can
do this in
two steps. First, you have racluster() read the old darkaddress.out
file
to prime it. With that data in its cache, you have racluster() open
the new
data file, and process the data, which automatically generates the new
dark address space data. You have racluster() write this new data to a
temporary file, and then replace the old with the new.
racluster -m daddr -r darkaddress.out $RADATA/raunreach.out -w
temp.out - \
\(not icmp\) and \(not src net $localaddr and dst net
$localaddr\)`;
mv temp.out darkaddress.out
By putting the multiple files on the command line, racluster() will read
them in sequentially, in the order they appear (this is very important
for some operations, as one file is the cache, then other is the new
data).
So, you as the administrator will want to keep an eye on this file to
make
sure that it reflects the real dark address space, but you'll find
that it
does a very good job. This persistent list of real dark addresses can
be very useful to improve radark() performance. We'll add this later to
the actual scripts of radark(), in another installation.
The second thing you can do with this raunreach.out data, is to build a
strong list of scanners, i.e. the external addresses that access
multiple
unreachable IP addresses. To do this you first get the list of "A ->
B"'s
that had unreachable events, and then aggregate that list to get
the "A ->" records. Then a simple filter is all you need.
So, an example would be this command:
`racluster -M norep -m saddr daddr -r $RADATA/raunreach.out -- \
\(not icmp\) and \(not src net $localaddr and dst net
$localaddr\)`;
What are we doing here? The "-M norep" option means we aren't
going to generate any aggregate statistics yet on this data; "no
report".
Hopefully this will become clear in the next step. We aggregate
using the
"-m saddr daddr" key, which gives us all the IP matrix data, with the
direction
semantics preserved ("A -> B").
So, this command gives us the "A -> B" flows that were unreachable.
Now, if we
aggregate the output using:
racluster -m saddr
We'll get record's for each external IP address that accessed an
unreachable
interior address. By doing it this way, we get in each record an
aggregation
statistic that will tell us how many addresses did the external
addresses
attempted to access that were unreachable. With this metric in
place, we can
now get an answer to the question, "What external IP addresses attempted
to access X unreachable internal addresses?" Isn't that a
reasonable question?
So putting it all together:
`racluster -M norep -m saddr daddr -r $RADATA/raunreach.out -w -
--\
\(not icmp\) and \(not src net $localaddr and dst net
$localaddr\) \
racluster -m saddr -w radarkaddress.out`;
This set of commands is in the radark.pl program, and we'll build on
this
as we go!!!
Now to look at the output, use something like this:
rasort -m trans -s stime dur trans saddr pkts bytes state \
-r $RADATA/darkaddress.out -- trans gt 100`;
Here we're picking addresses that accessed more than 100 internal
unreachable addresses (I only used this so the output wouldn't be too
long). A good number maybe 2-5, depending on how you want to track
this stuff.
Against some old data I have, this is the output:
Dur Trans SrcAddr TotPkts TotBytes State
210.420624 1599 211.222.8.44 3997 247814 INT
2494.95947 340 218.104.78.69 390 23400 INT
302.197723 221 202.99.219.206 222 13320 INT
2511.13232 147 210.51.1.47 154 9240 INT
2532.34838 112 218.22.14.104 116 6960 INT
In this case the "trans" field (number of transactions) represents
the number
of destination addresses that this IP address accessed. So the
higher, the
more confidence we should have that this IP address is scanning our
address
space.
Hopefully you will find this useful, and if so, I'll continue the
dialog.
Carter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20071026/44eb4dc9/attachment.html>
More information about the argus
mailing list