ratop hangs

Carter Bullard carter at qosient.com
Sun Jan 13 23:00:15 EST 2013


Hey Craig,
You are probably running out of memory.
Type control G when ratop is running and  you'll see what the record per second rate is,
how many flows are in the flow cache, and how often they are being updated.  You should
imagine that you can process somewhere between 50-100K flows per second, but you
can't track more than say 1M flows in your cache, without getting bogged down.

Best is to filter, or run with an aggregation model, rather than the single 5-tuple flow model.
Something like
   " -m matrix/16 "

will allow you to see all the /16 CIDR address matricies.  That is very valuable, and gets you
into somewhere around 10-30K associations, for some networks.  If you're seeing more than
2-4M of these, then run /10, that should be manageable.

Another filter that is useful is 

   " -m proto "

which will give you just a proto table.  That's pretty boring, but sometimes those large
aggregations are what you're really after, or at least for a short period of time.

Generally, you use ratop() to look at realtime views that have less than 5-10K flows,
You can send filters to the source, say looking for flows from a specific network, or host,
and that helps a great deal.

I suspect that you're bogging ratop() so badly, that it loses its external connection, as
argus will decide that ratop() isn't processing fast enough, and it will disconnect.
After ratop() finally does get done with the input, it has a lot of overhead to deal with,
so 1% CPU to go through all the caches and queues isn't probably too bad.

Carter



On Jan 13, 2013, at 10:11 PM, Craig Merchant <cmerchant at responsys.com> wrote:

> I’m running the 3.0.6.2 version of the clients on a CentOS 6.2 machine.
>  
> If I have ratop connect to a remote Argus daemon that is monitoring a network with traffic between 2.5 Gbps and 10 Gbps, ratop will run for about 60-90 seconds and then stop updating the flow data.  The CPU will steadily increase to 100% and then after it hangs, the CPU will slowly drop down less than 1%.
>  
> I should mention that the remote Argus daemon is running on top of DNA/libzero and listens on an interface provided by the pfdnacluster_master app.
>  
> Any idea how I can troubleshoot this?
> 
> Thx.
> 
> Craig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130113/42dd1c3c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130113/42dd1c3c/attachment.bin>


More information about the argus mailing list