Reducing argus-2.x log files for archiving
Dave Plonka
plonka at doit.wisc.edu
Thu Feb 7 14:33:57 EST 2002
Russell,
The Cflow and flowdumper package to which I recently added Argus
support offers a couple options for converting the files to other
formats. The 1st option is built-in, and the 2nd is custom:
1) Using flowdumper's "-r" option, one can convert argus or flow-tools
files to "cflow" format, which is that produced by CAIDA's cflowd
when collecting version 5 NetFlow records. "cflow" format happens
to be smaller and, inexplicably, more susceptible to compression.
For instance, I took an "argus.out" file and converted it to "cflow"
format file like this (which yields a file called "argus.out.cflow"):
$ flowdumper -r -o '%s.cflow' argus.out
Note that this is a "lossy" conversion, but the info that I consider
important for audit trails are still there (i.e. timestamps, IP
addresses, protocol, port, IP pkts/byte counters, etc.)
After converting that sample file, I gzip(1)ped them (with default
compression level), yielding these sizes:
file bytes
-------------------- -------
argus.out 1838476
argus.out.cflow 1506175
argus.out.gz 420898
argus.out.cflow.gz 133832
So, the compressed cflow-encoded file was about 32% of the size of
the compressed argus file.
Afterwards you can use flowdumper, or flow-tools to read the "cflow"
format files, such as "flowdumper -s argus.out.cflow".
As an aside, David Moore of CAIDA mentioned that "gzip -4" or "-5"
offers a good trade-off of CPU time vs. size for files containing
this sort of data. I.e. "-9" is usually overkill. Here's the sizes
for the aforementioned files compressed with "gzip -4", show that
what he said seems to hold true:
file bytes
-------------------- -------
argus.out.gz-4 523468
argus.out.cflow.gz-4 144321
2) Alternatively, since flowdumper and Cflow provide a perl API to read
argus records, you can use perl's "pack" function to create records
of a new format of your own choosing.
This is sort-of like the printf discussion recently in the mailing
list.
For instance, if you only cared to retain the IP addresses, port
numbers, pkts/bytes counters, timestamps, protocols, TOS, and TCP
flags, you could encode it this way:
$ flowdumper -ne '
print pack("N N N n n N N N N C C C",
0x43de6,
$srcaddr,
$dstaddr,
$srcport,
$dstport,
$pkts,
$bytes,
$startime,
$endtime,
$protocol,
$tos,
$tcp_flags)
' argus.out > argus.cflow_0x43de6
(The 0x43de6 is a magic number to indicate the record format. It's
actually a bit-mask indicating which flow variables are present,
which is how cflowd did it.)
Since this yields a custom format, it is necessary to use a custom
program/script to read the resulting file(s). I've attached an
example which does that called "read_cflow_0x43de6".
Dave
--
plonka at doit.wisc.edu http://net.doit.wisc.edu/~plonka ARS:N9HZF Madison, WI
-------------- next part --------------
#! /usr/local/bin/perl -w
# read_cflow_0x43de6 - a script to read raw binary flow files
# $Id: read_cflow_0x43de6,v 1.1 2001/05/18 16:39:07 dplonka Exp $
use Socket; # for inet_aton
use POSIX; # for strftime
my $format = "N N N n n N N N N C C C";
my $record_length = length(pack($format));
foreach $file (@ARGV) {
open(FILE, "<$file") || die "open \"$file\": $!\n";
my($record, $len);
while ($len = sysread(FILE, $record, $record_length)) {
die "short record in \"$file\": $len\n" if ($record_length != $len);
($index,
$srcaddr,
$dstaddr,
$srcport,
$dstport,
$pkts,
$bytes,
$startime,
$endtime,
$protocol,
$tos,
$tcp_flags) = unpack($format, $record);
die unless 0x43de6 == $index; # the magic number that matches $format
printf("%s (%us) %.15s.%hu -> %.15s.%hu %hu 0x%x 0x%x %u %u\n",
strftime("%Y/%m/%d %H:%M:%S", localtime($startime)),
$endtime-$startime,
inet_ntoa(pack("N", $srcaddr)),
$srcport,
inet_ntoa(pack("N", $dstaddr)),
$dstport,
$protocol,
$tcp_flags,
$tos,
$pkts,
$bytes);
}
}
More information about the argus
mailing list