Fwd: argus-clients-3.0.7.4 memory consumption

Carter Bullard carter at qosient.com
Tue Jan 22 06:34:48 EST 2013


Hey Woody,
If rabins()'s memory use climbs as you go along, and finally reaches all
the memory in your 4 hours, sounds like a memory leak.   If you don't mind
doing a little testing, if you could run your rabins() under valgrind() for a
while it may show us where the leak is occuring.

You will want to have compiled the clients with the developers support,
so that valgrind() will tell us the routines and line numbers where the memory
was dropped.   To do this, in the clients root directory:

   % make clobber
   % touch .devel
   % ./configure
   % make
   % make install

Then to run rabins() under valgrind():

   % valgrind --leak-check=full --show-reachable=yes /usr/local/bin/rabins -q -S localhost:555 -m proto daddr/32 -B 1m -M hard time 1m -p0 -s stime proto daddr spkts sbytes -n -u -c, - dst net 192.168.10.0/24 or 172.16.10.0/24 or 192.168.1.0/24 or 192.168.50.0/24

Let that run for a while, say 20-30 minutes, and then type Control-C to terminate the program.
Valgrind may printout quite a few messages while it running, these are important for the analysis.
If you can capture the screen output, then I maybe able to figure it out.  To suppress the output of
the argus record output, I've added the " -q " option to your command line.

I don't have a source of netflow records, so I can't debug this myself, so thanks for helping.

We do not recommend processing Cisco netflow records using rabins().  If flow records show
up, whose start times are earlier than ( ' the current time ' - ' the -B holding time ' ), rabins()
will throw the records away.  For Cisco netflow, that could be most of the records.  I've seen
routers send Cisco records from yesterday, on a regular basis.  I suspect that we have a memory
leak in the time range rejection stage of rabins().

In the absence of a bug, the amount of memory needed to run rabins() is based
solely on the number of flows that its caching.  With the " -B 1m " and the " -M time 1m "
options, you must have enough memory to hold 3 minutes worth of flow data in memory.
For some sites, that could be 3-6M flow records.  That many flow records will need a
lot of RAM.  So, if there isn't a memory leak, you will have to tune back your intervals.

Or will you need to aggregate your data using the " -m flow key fields " option.  This will
reduce the number of cache entries that rabins() will have to track.  The question is, what
are you trying to do with rabins() and is there a better way?

We recommend using rasplit() to read / write your netflow records, into file based bins,
and processing the bins when its time to do whatever analytic your doing.  You can
run rasplit() for a while, and then compare the creation time with the last write times
of the files to see how far back the router goes with the netflow records.  It can be as
long as 6 hours, .... really.

Hope this is useful,

Carter

On Jan 21, 2013, at 5:36 PM, Woody K <woodyk at gmail.com> wrote:

> Hello,
> 
> I would like to start off by saying that the netflow v9 support in version 3.0.7.4 appears to be spot on.  I am however running into an issue with rabins consuming large amounts of RAM when it has been running for over 4 hours.  If I allow it to continue it will consume all available RAM and crash. Here is how I am currently using argus and details on the system it is on.
> 
> Version: argus-clients 3.0.7.4
> Installation: ./configure && make && make install
> System Memory: 64GB
> System CPU: Dual Intel(R) Xeon(R) E5-2690 0 @ 2.90GHz 8 Core
> Flows p/s: ~750
> Netflow version: v9 and v5
> 
> /usr/local/bin/radium -d -C 5000 -P 555
> /usr/local/bin/rabins -S localhost:555 -m proto daddr/32 -B 1m -M hard time 1m -p0 -s stime proto daddr spkts sbytes -n -u -c, - dst net 192.168.10.0/24 or 172.16.10.0/24 or 192.168.1.0/24 or 192.168.50.0/24
> 
> Any suggestions would be appreciated.  If there is any specific output or information that would be useful to you please let me know.  Thanks.
> 
> --
> Woody
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130122/30f60983/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130122/30f60983/attachment.bin>


More information about the argus mailing list