Reducing argus-2.x log files for archiving

Dave Plonka plonka at doit.wisc.edu
Thu Feb 7 14:33:57 EST 2002


Russell,

The Cflow and flowdumper package to which I recently added Argus
support offers a couple options for converting the files to other
formats.  The 1st option is built-in, and the 2nd is custom:

1) Using flowdumper's "-r" option, one can convert argus or flow-tools
   files to "cflow" format, which is that produced by CAIDA's cflowd
   when collecting version 5 NetFlow records.  "cflow" format happens
   to be smaller and, inexplicably, more susceptible to compression.

   For instance, I took an "argus.out" file and converted it to "cflow"
   format file like this (which yields a file called "argus.out.cflow"):

      $ flowdumper -r -o '%s.cflow' argus.out

   Note that this is a "lossy" conversion, but the info that I consider
   important for audit trails are still there (i.e. timestamps, IP
   addresses, protocol, port, IP pkts/byte counters, etc.)

   After converting that sample file, I gzip(1)ped them (with default
   compression level), yielding these sizes:

     file                   bytes
     -------------------- -------
     argus.out            1838476
     argus.out.cflow      1506175
     argus.out.gz          420898
     argus.out.cflow.gz    133832

   So, the compressed cflow-encoded file was about 32% of the size of
   the compressed argus file.

   Afterwards you can use flowdumper, or flow-tools to read the "cflow"
   format files, such as "flowdumper -s argus.out.cflow".

   As an aside, David Moore of CAIDA mentioned that "gzip -4" or "-5"
   offers a good trade-off of CPU time vs. size for files containing
   this sort of data.  I.e. "-9" is usually overkill.  Here's the sizes
   for the aforementioned files compressed with "gzip -4", show that
   what he said seems to hold true:

     file                   bytes
     -------------------- -------
     argus.out.gz-4        523468
     argus.out.cflow.gz-4  144321

2) Alternatively, since flowdumper and Cflow provide a perl API to read
   argus records, you can use perl's "pack" function to create records
   of a new format of your own choosing.

   This is sort-of like the printf discussion recently in the mailing
   list.

   For instance, if you only cared to retain the IP addresses, port
   numbers, pkts/bytes counters, timestamps, protocols, TOS, and TCP
   flags, you could encode it this way:

      $ flowdumper -ne '
      print pack("N N N n n N N N N C C C",
           0x43de6,
           $srcaddr,
           $dstaddr,
           $srcport,
           $dstport,
           $pkts,
           $bytes,
           $startime,
           $endtime,
           $protocol,
           $tos,
           $tcp_flags)
      ' argus.out > argus.cflow_0x43de6

   (The 0x43de6 is a magic number to indicate the record format.  It's
   actually a bit-mask indicating which flow variables are present,
   which is how cflowd did it.)

   Since this yields a custom format, it is necessary to use a custom
   program/script to read the resulting file(s).  I've attached an
   example which does that called "read_cflow_0x43de6".

Dave

-- 
plonka at doit.wisc.edu  http://net.doit.wisc.edu/~plonka  ARS:N9HZF  Madison, WI
-------------- next part --------------
#! /usr/local/bin/perl -w

# read_cflow_0x43de6 - a script to read raw binary flow files
# $Id: read_cflow_0x43de6,v 1.1 2001/05/18 16:39:07 dplonka Exp $

use Socket; # for inet_aton
use POSIX; # for strftime

my $format = "N N N n n N N N N C C C";
my $record_length = length(pack($format));

foreach $file (@ARGV) {
   open(FILE, "<$file") || die "open \"$file\": $!\n";
   my($record, $len);

   while ($len = sysread(FILE, $record, $record_length)) {

      die "short record in \"$file\": $len\n" if ($record_length != $len);

      ($index,
       $srcaddr,
       $dstaddr,
       $srcport,
       $dstport,
       $pkts,
       $bytes,
       $startime,
       $endtime,
       $protocol,
       $tos,
       $tcp_flags) = unpack($format, $record);

      die unless 0x43de6 == $index; # the magic number that matches $format

      printf("%s (%us) %.15s.%hu -> %.15s.%hu %hu 0x%x 0x%x %u %u\n",
         strftime("%Y/%m/%d %H:%M:%S", localtime($startime)),
         $endtime-$startime,
         inet_ntoa(pack("N", $srcaddr)),
         $srcport,
         inet_ntoa(pack("N", $dstaddr)),
         $dstport,
         $protocol,
	 $tcp_flags,
	 $tos,
         $pkts,
         $bytes);
   }
}


More information about the argus mailing list