argus suggestions please

Michael Hornung hornung at cac.washington.edu
Fri Oct 5 17:55:37 EDT 2007


None of those options works on a whole day's worth of data at once, even 
the last when it tries to cluster all the processed files from /tmp.

As Peter mentioned it is possible to run stats on the individual smaller 
files throughout the day and process those later to produce the cumulative 
set of results.  I have some Perl that helps me do that as well.

For example, here is what I'm doing every five minutes after archiving the 
previous chunk of data:

	racluster -r ${file} -w - |  \
	rasort -r - -m bytes saddr daddr -w - - ip |  \
	ra -nn -s saddr sport daddr dport spkts dpkts  \
		pkts sloss dloss sbytes dbytes bytes  \
	> ${stats_dir}/${year}/${month}/${day}/${seconds}

At the end of the day I go through each of the reports generated above and 
do several things:

	1) Go through our ARP cache records (which we poll regularly) and 
	   associate the IPs to MACs based on the name of the report which 
	   is the time when the report was written (roughly when the 
	   device was talking online).

	2) Compile aggregate bytes transferred per MAC address.

	3) Publish a top-talkers list to a web page, including a graph 
	   (generated by gnuplot) of packet loss.

It looks like it will be easiest to re-process each raw file at the same 
time I do the argus reporting above and add separate accounting for number 
of flows per IP.  Then my end-of-day accounting can go through this 
additional data and attribute the flow counts to the appropriate device 
(MAC address).  Then I can provide another report view that sorts results 
by top flows.

-Mike

On Fri, 5 Oct 2007 at 15:58, Carter Bullard wrote:

|The solution is to not count the flows, but the flow records, or
|run the programs script against the individual 5m files, and then combine
|the output to generate the final file.
|
|The fastest way, is to skip the first racluster, as it is the one that is
|eating the
|memory.
|
|Try this:
|  racluster -M rmon -m saddr -R archive/2007/10/04 -w - | \
|  rasort -m bytes -s saddr trans sbytes dbytes
|
|That should run if you have less than 1M addresses, give or take
|250K. If that works and you want to still count the unique flows,
|try this variant:
|
|  racluster -M ind -R archive/2007/10/04 -M norep -w - -- ip|\
|  racluster -M rmon -m saddr -w - | \
|  rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|
|The "-M ind" option will cause racluster to process each file
|independantly, rather than treating the entire directory structure
|as a single stream.
|
|If none of these are successful then try doing the top x for each
|5 minute file, and then raclustering and rasorting the 5m files.
|
|Using bash:
|  for i in archive/2007/10/04/*; do echo $i; racluster -r $i -w - -- ip | \
|  racluster -M rmon -m saddr -w - | rasort -m bytes -w /tmp/$i.srt; done
|
|  racluster -R /tmp/archive/2007/10/04 -w - | \
|  rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|
|then delete the /tmp/archive/2007/10/04 directory.
|
|Does any of that work?
|
|Carter
|
|
|Michael Hornung wrote:
|> Thanks Carter, this is what I was hoping to hear!  You guessed my setup
|> exactly, though I've got a problem with what you sent, and I suspect it may
|> be related to the amount of data.  The box I'm using to do processing is x86
|> linux (RHEL5) with 2 x dual core 2ghz CPUs and 4GB of RAM.
|> 
|> % du -hs archive/2007/10/04
|> 20G     archive/2007/10/04
|> 
|> % racluster -R archive/2007/10/04 -M norep -w foo -- ip
|> Segmentation fault
|> 
|> strace shows:
|> brk(0x4e0be000)                         = 0x4e09d000
|> mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
|> = -1 ENOMEM (Cannot allocate memory)
|> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
|> 0) = -1 ENOMEM (Cannot allocate memory)
|> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
|> 0) = -1 ENOMEM (Cannot allocate memory)
|> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
|> 0) = -1 ENOMEM (Cannot allocate memory)
|> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1,
|> 0) = -1 ENOMEM (Cannot allocate memory)
|> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
|> +++ killed by SIGSEGV +++
|> 
|> 
|> I ran top while racluster was running and it seems that the process runs out
|> of memory, and I'm nearing the system's limits...so what can be done about
|> this?
|> 
|>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
|> 7551 argus     20   0 3062m 2.9g  836 R   76 83.2   0:38.64 racluster
|> 
|> -Mike
|> 
|> On Fri, 5 Oct 2007 at 14:19, Carter Bullard wrote:
|> 
|> |Hey Michael,
|> |I think asking these questions are great!!!  As it gets examples into the
|> |mailing list,
|> |where people can search etc....
|> |
|> |So, you have a daily directory and you want a report based on IP top
|> talkers.
|> |Lets say the directory is in the standard argus archive format, and we'll do
|> |yesterday.
|> |Here is the set of commands that I would use:
|> |
|> |  racluster -R archive/2007/10/04 -M norep -w - -- ip | \
|> |  racluster -M rmon -m saddr -w - | \
|> |  rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|> |
|> |So what does this do:
|> |  racluster -R archive... -M norep -w - -- ip      This program will read in
|> a
|> |days worth of IP data and assemble all the flow status
|> |      reports into individual flow report.  We need to do this because you
|> said
|> |you wanted
|> |      to know how many flows there were.  The "-M norep" option sez don't
|> |report the
|> |      merge statistics for aggregations.  This allows for a single record to
|> be
|> |      tallied as a single flow.  And we write the output to stdout.
|> |
|> |  racluster -M rmon -m saddr -w -
|> |     This program will read in the stream of single flow reports from stdin
|> and
|> |generate
|> |      the top talker stats.  The rmon option pushes the identifiers to the
|> src
|> |fields, and
|> |      the -m option , and write the output to stdout.
|> |
|> |  rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|> |     This program sorts the output based on total bytes for each top talker.
|> |      and prints out the IP address, the number of flows, the bytes
|> transmitted
|> |by
|> |      the talker and the bytes received.
|> |
|> |  Now if you want the top 20 talkers, you need to select the first 20
|> records
|> |  from the rasort(), to do this:
|> |  racluster -R archive/2007/10/04 -M norep -w - -- ip | \
|> |  racluster -M rmon -m saddr -w - | \
|> |  rasort -m bytes -w - |\
|> |  ra -N 20 -s saddr trans:10 sbytes:14 dbytes:14
|> |
|> |
|> |If you try this and get something weird, send mail!!  It would be
|> |good if we can get a "standard" set of calls that people understand.
|> |
|> |Carter
|> |
|> |Michael Hornung wrote:
|> |> I have an ra reading from a remote argus collector 24x7, and every 5
|> minutes
|> |> the argus file is archived; at the end of a day I have 290 files
|> representing
|> |> the traffic from that day.
|> |> |> Let's say I want to make a list of the top talkers, sorted by total
|> bytes
|> |> transferred.  Given those top talkers, I want to see the following as
|> text,
|> |> and/or alternately graphed, for each top talker:
|> |> |> IP
|> |> # flows
|> |> # bytes rcvd
|> |> # bytes sent
|> |> |> Can you recommend a command-line that's going to give me this?  The
|> profusion
|> |> of argus utilities and a lack of examples is making this hard for me.
|> |> Thanks.
|> |> |> -Mike
|> |> |
|> |
|> 
|>   
|
|



More information about the argus mailing list