country code discussion and example

Carter Bullard carter at qosient.com
Tue Oct 2 23:33:19 EDT 2007


Gentle people,
I just wanted to document on the mailing list the support for  
printing country codes.
Why, what, how etc....

While country codes are not "truth", they are useful for categorizing  
traffic, and
so I've added a bunch of support for mapping IP addresses to publicly  
available
country code information.  The biggest win is when we can filter and  
aggregate based
on country codes, and so I spent the time today to get these features  
in the ra*
programs..

OK, there are two types of country code support in ra* programs.  The  
first
is printing support, which is a real-time local lookup of an IP  
address against
the RIR databases.  This doesn't modify any record content, and so this
support doesn't provide filtering or aggregation,...., just field  
printing.  This
is enabled automatically if you have either an "sco" or a "dco" in  
the print
field specification.

The ra* programs will read in the RIR database from the path specified
in the .rarc file.  There is a new rarc file variable,  
RA_DELEGATED_IP, where you
specify the path to a file of the type found in ./support/Config/ 
delegated-ip-latest.
The client package Makefile will install this file in /usr/local/ 
argus, by
default, and so the sample rarc file in ./support/Config has this  
path as
its value.

Because the RIR databases change weekly (maybe daily?), in order to make
this useful, we have scripts to generate current IPv4 delegated  
address maps.

The second form of support is where we add the country codes to the  
actual
argus records.  When they are a part of the record, ra* clients can  
filter,
aggregate, sort, strip and anonymize the country codes.  There is a new
client, called ralabel(), which will add country code labels to an  
argus data
stream, and from there you can actually do work with the codes.

If a record has embedded country codes, the ra* programs will not  
consult
newer databases when processing the country codes.  Basically by putting
them in the records, you fix the values in time, which is a good thing.

ralabel() uses several sources for country code information.  The  
first place
it looks is in the local RIR databases that are provided in the client
distribution.  These db's are not complete, so if ralabel() can't  
find a countr
  code, it performs a reverse DNS lookup, to see if there is a  
country code in
the IP address's fully qualified domain name (FQDN).  Currently we
use the first country code we get, first from the RIR database, then  
the DNS.
We can add other queries as well, such as the whois database, but we'll
start with this first to see how it goes.

Here is an example of some data that spans the summer for a new server.
(We reference the source country code as "sco"):

    ralabel -nnnR datadir -w - | racluster -m sco -w - | rasort -m  
bytes -s stime dur sco trans pkts bytes state

          StartTime              Dur sCo    Trans      TotPkts         
TotBytes State
2007/06/08.13:24:3      9666858.000  US   472501    267263652     
645752494338   CON
2007/07/26.04:02:5      2607663.000  DE    40624      5642307       
6217372807   CON
2007/07/26.14:22:3      2142552.750  IT     7245       754204       
1199815337   CON
2007/08/09.10:09:0        22408.656  AT     3903       577621        
637107386   CON
2007/09/03.00:52:2       871139.062  IN     1204        44309         
70867839   CON
2007/06/13.06:12:0      9136759.000  CA     2199        58517         
37835084   CON
2007/09/12.04:52:1       609061.438  HK      716        20316         
34759598   CON
2007/08/01.13:49:0       932323.688  AP    16591        44937         
20652629   CON
2007/09/07.00:59:1          262.234  NZ       34         5564         
11564321   CON
2007/09/13.15:45:4         1553.102  GY      203         7719          
8524481   CON
2007/06/28.10:56:4      7929962.500  GB      792         4615          
3718261   CON
2007/09/19.08:29:2       772173.500  SA       35         1742          
3456232   CON
2007/06/11.10:28:4      8061142.500  EU        8           
387          595600   CON
2007/07/17.23:35:0      4928945.000  CN       25           
117           34923   CON
2007/08/10.14:33:3      4089591.750  FR       25           
263           33814   CON
2007/07/19.10:41:0            8.146  AF        4            
36           18270   CON
2007/08/06.10:06:2      1395395.500  SE        6            
75           16833   CON
2007/07/30.18:10:0      2706994.000  KR        5            
48           16309   CON
2007/09/08.03:49:0            0.362  BY        1            
20           12092   CON
2007/07/02.15:53:0        46291.188  EE        2            
17            1539   CON
2007/07/09.15:37:2            0.524  CZ        1            
13            1269   CON
2007/09/01.07:24:1         6793.704  AU        2            
12            1215   CON
2007/07/18.10:28:5            0.251  IS        1            
11            1175   CON
2007/08/03.14:05:3            0.465  TW        1             
6             358   CON
2007/06/13.16:34:4            0.000  RU        1             
2             128   CON
2007/08/23.18:05:3            0.000  ES        1             
1              62   INT
2007/07/29.11:14:3            0.000  JP        1             
1              60   INT

The ralabel() adds the country codes to the records, and the  
racluster simply
merges records that have a matching country code string.  rasort()  
creates
the ranks, and there you go, a decent activity table based on country.
The -nnn is there to guarantee that if we have to do a reverse DNS  
lookup
to find the country code, we actually get a name resolved, instead of  
an address.
I'll eliminate that on the next round so you don't have to remember.

All country codes are 2 character ascii strings, and so the data demand
to embed them in each record is not huge (8 bytes), but it is  
significant,
when you have billions of records, so we'll want to be able to rastrip()
the codes out of the records, and of course, with anonymization, I  
see a situation
where you would want to anonymize the IP addresses, but leave the  
country
codes intact.

Ok well, hopefully that is helpful.  I'll add more later this week.

Carter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20071002/779057b1/attachment.html>


More information about the argus mailing list