argus suggestions please

Carter Bullard carter at qosient.com
Fri Oct 5 15:58:37 EDT 2007


The solution is to not count the flows, but the flow records, or
run the programs script against the individual 5m files, and then combine
the output to generate the final file.

The fastest way, is to skip the first racluster, as it is the one that 
is eating the
memory.

Try this:
   racluster -M rmon -m saddr -R archive/2007/10/04 -w - | \
   rasort -m bytes -s saddr trans sbytes dbytes

That should run if you have less than 1M addresses, give or take
250K. If that works and you want to still count the unique flows,
try this variant:

   racluster -M ind -R archive/2007/10/04 -M norep -w - -- ip|\
   racluster -M rmon -m saddr -w - | \
   rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14

The "-M ind" option will cause racluster to process each file
independantly, rather than treating the entire directory structure
as a single stream.

If none of these are successful then try doing the top x for each
5 minute file, and then raclustering and rasorting the 5m files.

Using bash:
   for i in archive/2007/10/04/*; do echo $i; racluster -r $i -w - -- ip | \
   racluster -M rmon -m saddr -w - | rasort -m bytes -w /tmp/$i.srt; done

   racluster -R /tmp/archive/2007/10/04 -w - | \
   rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14

then delete the /tmp/archive/2007/10/04 directory.

Does any of that work?

Carter


Michael Hornung wrote:
> Thanks Carter, this is what I was hoping to hear!  You guessed my setup 
> exactly, though I've got a problem with what you sent, and I suspect it 
> may be related to the amount of data.  The box I'm using to do processing 
> is x86 linux (RHEL5) with 2 x dual core 2ghz CPUs and 4GB of RAM.
>
> % du -hs archive/2007/10/04
> 20G     archive/2007/10/04
>
> % racluster -R archive/2007/10/04 -M norep -w foo -- ip
> Segmentation fault
>
> strace shows:
> brk(0x4e0be000)                         = 0x4e09d000
> mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
> 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, 
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, 
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, 
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> mmap2(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, 
> -1, 0) = -1 ENOMEM (Cannot allocate memory)
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
>
>
> I ran top while racluster was running and it seems that the process runs 
> out of memory, and I'm nearing the system's limits...so what can be done 
> about this?
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
>  7551 argus     20   0 3062m 2.9g  836 R   76 83.2   0:38.64 racluster
>
> -Mike
>
> On Fri, 5 Oct 2007 at 14:19, Carter Bullard wrote:
>
> |Hey Michael,
> |I think asking these questions are great!!!  As it gets examples into the
> |mailing list,
> |where people can search etc....
> |
> |So, you have a daily directory and you want a report based on IP top talkers.
> |Lets say the directory is in the standard argus archive format, and we'll do
> |yesterday.
> |Here is the set of commands that I would use:
> |
> |  racluster -R archive/2007/10/04 -M norep -w - -- ip | \
> |  racluster -M rmon -m saddr -w - | \
> |  rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
> |
> |So what does this do:
> |  racluster -R archive... -M norep -w - -- ip      This program will read in a
> |days worth of IP data and assemble all the flow status
> |      reports into individual flow report.  We need to do this because you said
> |you wanted
> |      to know how many flows there were.  The "-M norep" option sez don't
> |report the
> |      merge statistics for aggregations.  This allows for a single record to be
> |      tallied as a single flow.  And we write the output to stdout.
> |
> |  racluster -M rmon -m saddr -w -
> |     This program will read in the stream of single flow reports from stdin and
> |generate
> |      the top talker stats.  The rmon option pushes the identifiers to the src
> |fields, and
> |      the -m option , and write the output to stdout.
> |
> |  rasort -m bytes -s saddr trans:10 sbytes:14 dbytes:14
> |     This program sorts the output based on total bytes for each top talker.
> |      and prints out the IP address, the number of flows, the bytes transmitted
> |by
> |      the talker and the bytes received.
> |
> |  Now if you want the top 20 talkers, you need to select the first 20 records
> |  from the rasort(), to do this:
> |  racluster -R archive/2007/10/04 -M norep -w - -- ip | \
> |  racluster -M rmon -m saddr -w - | \
> |  rasort -m bytes -w - |\
> |  ra -N 20 -s saddr trans:10 sbytes:14 dbytes:14
> |
> |
> |If you try this and get something weird, send mail!!  It would be
> |good if we can get a "standard" set of calls that people understand.
> |
> |Carter
> |
> |Michael Hornung wrote:
> |> I have an ra reading from a remote argus collector 24x7, and every 5 minutes
> |> the argus file is archived; at the end of a day I have 290 files representing
> |> the traffic from that day.
> |> 
> |> Let's say I want to make a list of the top talkers, sorted by total bytes
> |> transferred.  Given those top talkers, I want to see the following as text,
> |> and/or alternately graphed, for each top talker:
> |> 
> |> IP
> |> # flows
> |> # bytes rcvd
> |> # bytes sent
> |> 
> |> Can you recommend a command-line that's going to give me this?  The profusion
> |> of argus utilities and a lack of examples is making this hard for me.
> |> Thanks.
> |> 
> |> -Mike
> |> 
> |
> |
>
>   



More information about the argus mailing list