argus suggestions please
Michael Hornung
hornung at cac.washington.edu
Mon Oct 8 12:24:12 EDT 2007
7,713,318 unique IPs (both src and dst) seen in one day (10/1, see bottom
of message for detail). This is a research institution and I'm
studying...ahem...active parts of our network.
Where I'm monitoring in the network I do not see end host MAC addresses,
which is why I have to carve those out in post-processing.
I think my current scripts are giving me the data I need, which is the top
talkers in bytes. The way I looked at it was that saving the reports with
per-flow detail could allow me to return to them at a later time and get
further helpful information if I no longer have the argus files for that
period of time.
My recent email starting this thread was to add *additional* reporting to
what I'm already generating by adding a daily sum of flows per local
source address, so the other info I'm generating is useful to me.
Here's a terse breakdown of the IPs seen in a day:
10,687 IPs local to monitored segments
7,702,631 other IPs
---------------------------------------
7,713,318 IPs total
Thanks for your help Carter. I'll try out some of your suggestions and
see how they work.
-Mike
On Mon, 8 Oct 2007 at 10:35, Carter Bullard wrote:
|So Michael,
|How many IP addresses are we talking about?
|
|I may be missing something, but I'm not sure that your example
|scripts are doing what you'd like? In think your scripts will
|generate at the end of the each period, an ascii printout of
|each flow sorted by total bytes. Is this what you intend?
|
|If all you want is total flows by IP address, why not print the ascii
|list of just IP addresses, along with the appropriate metric? And
|could you print the mac addresses so you have them right there?
|(unless the mac addresses in the argus records are not the mac
|addresses you want to tally)?
|
| racluster -M norep -f ${file} -w - - ip | \ /* generate
|single flow recs
| racluster -m smac saddr proto -M rmon -w - | \ /* aggregate by IP address
| rasort -m bytes smac saddr -w - | \ /* sort output
| ra -nn -s smac saddr trans spkts dpkts pkts sloss dloss sbytes dbytes bytes >
| ${stats_dir}/.....
|
|This will give you (assuming your argus data has MAC address turned on)
|a report with mac/IP address pairings, the flow count ('trans'), and the 'in
|and out'
|packets and the loss reported and the byte statistics all based on IP address.
|So for each period you get the IP address and the flow counts and a bunch
|of other interesting data.
|
|So how long are these periods? 1, 5 min?
|
|If I was doing it, and I wanted to generate your lists and a top 20 talkers
|list at the end of the day and graph it, and I was challenged on memory,
|I would do this (assuming the mac addresses in the argus records are
|the ones you want):
|
| racluster -M norep -f ${file} -w - - ip | \
| racluster -m smac saddr proto -M rmon -w - | \
| rasort -m bytes smac saddr -w - | \
| ra -N 1000 -w ${stats_dir}/...../day/period
|
|This will generate at the end of each time period the top 1000 talkers
|database: then when its time to generate the top 20 talkers for the day:
|
| rasort -R ${stats_dir}/.../day -m bytes smac saddr -w - |\
| ra -N 20 -w top20.talkers.list
|
|That would really fly, I suspect.
|
|The top 20 daily talker graph is easy, just need to get the list
|of IP addresses you want and then run ragraph with the appropriate
|set of IP address data:
|
| ra -s addr -r top20.talkers.list > addrs.list
| rafilteraddr -f addrs.list -R ${stats_dir}/..../daily > /tmp/data
| ragraph spkts dpkts saddr -M 1m -w /tmp/ragraph.png
|
|
|Or at least something like that.
|
|
|Carter
|
|
|On Oct 5, 2007, at 5:55 PM, Michael Hornung wrote:
|
|> None of those options works on a whole day's worth of data at once, even
|> the last when it tries to cluster all the processed files from /tmp.
|>
|> As Peter mentioned it is possible to run stats on the individual smaller
|> files throughout the day and process those later to produce the cumulative
|> set of results. I have some Perl that helps me do that as well.
|>
|> For example, here is what I'm doing every five minutes after archiving the
|> previous chunk of data:
|>
|> racluster -r ${file} -w - | \
|> rasort -r - -m bytes saddr daddr -w - - ip | \
|> ra -nn -s saddr sport daddr dport spkts dpkts \
|> pkts sloss dloss sbytes dbytes bytes \
|> > ${stats_dir}/${year}/${month}/${day}/${seconds}
|>
|> At the end of the day I go through each of the reports generated above and
|> do several things:
|>
|> 1) Go through our ARP cache records (which we poll regularly) and
|> associate the IPs to MACs based on the name of the report which
|> is the time when the report was written (roughly when the
|> device was talking online).
|>
|> 2) Compile aggregate bytes transferred per MAC address.
|>
|> 3) Publish a top-talkers list to a web page, including a graph
|> (generated by gnuplot) of packet loss.
|>
|> It looks like it will be easiest to re-process each raw file at the same
|> time I do the argus reporting above and add separate accounting for number
|> of flows per IP. Then my end-of-day accounting can go through this
|> additional data and attribute the flow counts to the appropriate device
|> (MAC address). Then I can provide another report view that sorts results
|> by top flows.
|>
|> -Mike
|>
|> On Fri, 5 Oct 2007 at 15:58, Carter Bullard wrote:
|>
|> |The solution is to not count the flows, but the flow records, or
|> |run the programs script against the individual 5m files, and then combine
|> |the output to generate the final file.
|> |
|> |The fastest way, is to skip the first racluster, as it is the one that is
|> |eating the
|> |memory.
|> |
|> |Try this:
|> | racluster -M rmon -m saddr -R archive/2007/10/04 -w - | \
|> | rasort -m bytes -s saddr trans sbytes dbytes
|> |
|> |That should run if you have less than 1M addresses, give or take
|> |250K. If that works and you want to still count the unique flows,
|> |try this variant:
|> |
|> | racluster -M ind -R archive/2007/10/04 -M norep -w - -- ip|\
|> | racluster -M rmon -m saddr -w - | \
|> | rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|> |
|> |The "-M ind" option will cause racluster to process each file
|> |independantly, rather than treating the entire directory structure
|> |as a single stream.
|> |
|> |If none of these are successful then try doing the top x for each
|> |5 minute file, and then raclustering and rasorting the 5m files.
|> |
|> |Using bash:
|> | for i in archive/2007/10/04/*; do echo $i; racluster -r $i -w - -- ip | \
|> | racluster -M rmon -m saddr -w - | rasort -m bytes -w /tmp/$i.srt; done
|> |
|> | racluster -R /tmp/archive/2007/10/04 -w - | \
|> | rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|> |
|> |then delete the /tmp/archive/2007/10/04 directory.
|> |
|> |Does any of that work?
|> |
|> |Carter
|> |
|> |
|> |Michael Hornung wrote:
|> |> Thanks Carter, this is what I was hoping to hear! You guessed my setup
|> |> exactly, though I've got a problem with what you sent, and I suspect it
|> may
|> |> be related to the amount of data. The box I'm using to do processing is
|> x86
|> |> linux (RHEL5) with 2 x dual core 2ghz CPUs and 4GB of RAM.
|> |>
|> |> % du -hs archive/2007/10/04
|> |> 20G archive/2007/10/04
|> |>
|> |> % racluster -R archive/2007/10/04 -M norep -w foo -- ip
|> |> Segmentation fault
|> |>
|> |> strace shows:
|> |> brk(0x4e0be000) = 0x4e09d000
|> |> mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
|> 0)
|> |> = -1 ENOMEM (Cannot allocate memory)
|> |> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
|> -1,
|> |> 0) = -1 ENOMEM (Cannot allocate memory)
|> |> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
|> -1,
|> |> 0) = -1 ENOMEM (Cannot allocate memory)
|> |> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
|> -1,
|> |> 0) = -1 ENOMEM (Cannot allocate memory)
|> |> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
|> -1,
|> |> 0) = -1 ENOMEM (Cannot allocate memory)
|> |> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
|> |> +++ killed by SIGSEGV +++
|> |>
|> |>
|> |> I ran top while racluster was running and it seems that the process runs
|> out
|> |> of memory, and I'm nearing the system's limits...so what can be done about
|> |> this?
|> |>
|> |> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|> |> 7551 argus 20 0 3062m 2.9g 836 R 76 83.2 0:38.64 racluster
|> |>
|> |> -Mike
|> |>
|> |> On Fri, 5 Oct 2007 at 14:19, Carter Bullard wrote:
|> |>
|> |> |Hey Michael,
|> |> |I think asking these questions are great!!! As it gets examples into the
|> |> |mailing list,
|> |> |where people can search etc....
|> |> |
|> |> |So, you have a daily directory and you want a report based on IP top
|> |> talkers.
|> |> |Lets say the directory is in the standard argus archive format, and we'll
|> do
|> |> |yesterday.
|> |> |Here is the set of commands that I would use:
|> |> |
|> |> | racluster -R archive/2007/10/04 -M norep -w - -- ip | \
|> |> | racluster -M rmon -m saddr -w - | \
|> |> | rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|> |> |
|> |> |So what does this do:
|> |> | racluster -R archive... -M norep -w - -- ip This program will read
|> in
|> |> a
|> |> |days worth of IP data and assemble all the flow status
|> |> | reports into individual flow report. We need to do this because
|> you
|> |> said
|> |> |you wanted
|> |> | to know how many flows there were. The "-M norep" option sez don't
|> |> |report the
|> |> | merge statistics for aggregations. This allows for a single record
|> to
|> |> be
|> |> | tallied as a single flow. And we write the output to stdout.
|> |> |
|> |> | racluster -M rmon -m saddr -w -
|> |> | This program will read in the stream of single flow reports from
|> stdin
|> |> and
|> |> |generate
|> |> | the top talker stats. The rmon option pushes the identifiers to
|> the
|> |> src
|> |> |fields, and
|> |> | the -m option , and write the output to stdout.
|> |> |
|> |> | rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
|> |> | This program sorts the output based on total bytes for each top
|> talker.
|> |> | and prints out the IP address, the number of flows, the bytes
|> |> transmitted
|> |> |by
|> |> | the talker and the bytes received.
|> |> |
|> |> | Now if you want the top 20 talkers, you need to select the first 20
|> |> records
|> |> | from the rasort(), to do this:
|> |> | racluster -R archive/2007/10/04 -M norep -w - -- ip | \
|> |> | racluster -M rmon -m saddr -w - | \
|> |> | rasort -m bytes -w - |\
|> |> | ra -N 20 -s saddr trans:10 sbytes:14 dbytes:14
|> |> |
|> |> |
|> |> |If you try this and get something weird, send mail!! It would be
|> |> |good if we can get a "standard" set of calls that people understand.
|> |> |
|> |> |Carter
|> |> |
|> |> |Michael Hornung wrote:
|> |> |> I have an ra reading from a remote argus collector 24x7, and every 5
|> |> minutes
|> |> |> the argus file is archived; at the end of a day I have 290 files
|> |> representing
|> |> |> the traffic from that day.
|> |> |> |> Let's say I want to make a list of the top talkers, sorted by total
|> |> bytes
|> |> |> transferred. Given those top talkers, I want to see the following as
|> |> text,
|> |> |> and/or alternately graphed, for each top talker:
|> |> |> |> IP
|> |> |> # flows
|> |> |> # bytes rcvd
|> |> |> # bytes sent
|> |> |> |> Can you recommend a command-line that's going to give me this? The
|> |> profusion
|> |> |> of argus utilities and a lack of examples is making this hard for me.
|> |> |> Thanks.
|> |> |> |> -Mike
|> |> |> |
|> |> |
|> |>
|> |>
|> |
|> |
|>
|
More information about the argus
mailing list