country code discussion and example
Carter Bullard
carter at qosient.com
Tue Oct 2 23:33:19 EDT 2007
Gentle people,
I just wanted to document on the mailing list the support for
printing country codes.
Why, what, how etc....
While country codes are not "truth", they are useful for categorizing
traffic, and
so I've added a bunch of support for mapping IP addresses to publicly
available
country code information. The biggest win is when we can filter and
aggregate based
on country codes, and so I spent the time today to get these features
in the ra*
programs..
OK, there are two types of country code support in ra* programs. The
first
is printing support, which is a real-time local lookup of an IP
address against
the RIR databases. This doesn't modify any record content, and so this
support doesn't provide filtering or aggregation,...., just field
printing. This
is enabled automatically if you have either an "sco" or a "dco" in
the print
field specification.
The ra* programs will read in the RIR database from the path specified
in the .rarc file. There is a new rarc file variable,
RA_DELEGATED_IP, where you
specify the path to a file of the type found in ./support/Config/
delegated-ip-latest.
The client package Makefile will install this file in /usr/local/
argus, by
default, and so the sample rarc file in ./support/Config has this
path as
its value.
Because the RIR databases change weekly (maybe daily?), in order to make
this useful, we have scripts to generate current IPv4 delegated
address maps.
The second form of support is where we add the country codes to the
actual
argus records. When they are a part of the record, ra* clients can
filter,
aggregate, sort, strip and anonymize the country codes. There is a new
client, called ralabel(), which will add country code labels to an
argus data
stream, and from there you can actually do work with the codes.
If a record has embedded country codes, the ra* programs will not
consult
newer databases when processing the country codes. Basically by putting
them in the records, you fix the values in time, which is a good thing.
ralabel() uses several sources for country code information. The
first place
it looks is in the local RIR databases that are provided in the client
distribution. These db's are not complete, so if ralabel() can't
find a countr
code, it performs a reverse DNS lookup, to see if there is a
country code in
the IP address's fully qualified domain name (FQDN). Currently we
use the first country code we get, first from the RIR database, then
the DNS.
We can add other queries as well, such as the whois database, but we'll
start with this first to see how it goes.
Here is an example of some data that spans the summer for a new server.
(We reference the source country code as "sco"):
ralabel -nnnR datadir -w - | racluster -m sco -w - | rasort -m
bytes -s stime dur sco trans pkts bytes state
StartTime Dur sCo Trans TotPkts
TotBytes State
2007/06/08.13:24:3 9666858.000 US 472501 267263652
645752494338 CON
2007/07/26.04:02:5 2607663.000 DE 40624 5642307
6217372807 CON
2007/07/26.14:22:3 2142552.750 IT 7245 754204
1199815337 CON
2007/08/09.10:09:0 22408.656 AT 3903 577621
637107386 CON
2007/09/03.00:52:2 871139.062 IN 1204 44309
70867839 CON
2007/06/13.06:12:0 9136759.000 CA 2199 58517
37835084 CON
2007/09/12.04:52:1 609061.438 HK 716 20316
34759598 CON
2007/08/01.13:49:0 932323.688 AP 16591 44937
20652629 CON
2007/09/07.00:59:1 262.234 NZ 34 5564
11564321 CON
2007/09/13.15:45:4 1553.102 GY 203 7719
8524481 CON
2007/06/28.10:56:4 7929962.500 GB 792 4615
3718261 CON
2007/09/19.08:29:2 772173.500 SA 35 1742
3456232 CON
2007/06/11.10:28:4 8061142.500 EU 8
387 595600 CON
2007/07/17.23:35:0 4928945.000 CN 25
117 34923 CON
2007/08/10.14:33:3 4089591.750 FR 25
263 33814 CON
2007/07/19.10:41:0 8.146 AF 4
36 18270 CON
2007/08/06.10:06:2 1395395.500 SE 6
75 16833 CON
2007/07/30.18:10:0 2706994.000 KR 5
48 16309 CON
2007/09/08.03:49:0 0.362 BY 1
20 12092 CON
2007/07/02.15:53:0 46291.188 EE 2
17 1539 CON
2007/07/09.15:37:2 0.524 CZ 1
13 1269 CON
2007/09/01.07:24:1 6793.704 AU 2
12 1215 CON
2007/07/18.10:28:5 0.251 IS 1
11 1175 CON
2007/08/03.14:05:3 0.465 TW 1
6 358 CON
2007/06/13.16:34:4 0.000 RU 1
2 128 CON
2007/08/23.18:05:3 0.000 ES 1
1 62 INT
2007/07/29.11:14:3 0.000 JP 1
1 60 INT
The ralabel() adds the country codes to the records, and the
racluster simply
merges records that have a matching country code string. rasort()
creates
the ranks, and there you go, a decent activity table based on country.
The -nnn is there to guarantee that if we have to do a reverse DNS
lookup
to find the country code, we actually get a name resolved, instead of
an address.
I'll eliminate that on the next round so you don't have to remember.
All country codes are 2 character ascii strings, and so the data demand
to embed them in each record is not huge (8 bytes), but it is
significant,
when you have billions of records, so we'll want to be able to rastrip()
the codes out of the records, and of course, with anonymization, I
see a situation
where you would want to anonymize the IP addresses, but leave the
country
codes intact.
Ok well, hopefully that is helpful. I'll add more later this week.
Carter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20071002/779057b1/attachment.html>
More information about the argus
mailing list