lost flows and memory leak in radium
    Carter Bullard 
    carter at qosient.com
       
    Sat Jan 26 18:32:15 EST 2013
    
    
  
Hey Craig,
This is interesting, as we haven't had much in the way of pure radium performance
reports with labeling.  The cycle requirements for labels will vary quite a bit depending
on the strategy.  Address based labeling will perform the best, as we have a pretty fast
patricia tree structure for address and label lookup.  The flow based labeling may be
the worst performing, as we have to switch out the search contexts for each rule.
And no telling how fast the GeoIP goes, but its been the most used label to date, so
I think they do a pretty good job.
Can you try a few sample label strategies, just to tease out where the loads are?
Maybe start with a single rule in each label strategy, doing one strategy at a time,
and then ramp them up with 2, 4 8, etc... rules, until we get to your complexity.
A good sample would be a label rule that labels everything, with a small label,
vs a rule that labels everything with a large label, so that we're accounting for the
label sizes as an impact on performance.
That will help.  There are a lot of queues, a lot of buffering, a lot of things going on.
Can you share your radium.conf file?  and the ralabel.conf style Classifier file?
Carter
On Jan 26, 2013, at 2:45 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> I tried rebooting the server with the label options commented out in radium.conf.  When the server came up, radium was running at 11% CPU and there were no pauses or loss of flows when clients connected.  I added the labeling config back to radium.conf and restarted.  The CPU ran at over 190% and the flow loss and pauses returned.
>  
> I commented those lines back out again and restarted radium.  Radium ran at around 150% with flow loss and pauses.  I rebooted the server again and it radium was back to normal.
>  
> From: argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu [mailto:argus-info-bounces+cmerchant=responsys.com at lists.andrew.cmu.edu] On Behalf Of Craig Merchant
> Sent: Friday, January 25, 2013 4:44 PM
> To: Argus (argus-info at lists.andrew.cmu.edu)
> Subject: [ARGUS] lost flows and memory leak in radium
>  
> We’ve got one data center currently running argus on our IDS sensor (CentOS 6.2) and it listens on a DNA/libzero interface thanks to code from Chris Wakelin.  So, we do experience the bug in PF_RING where some select() call causes argusd to run at 100% CPU all the time.
>  
> We probably average between 4-8 Gbps of traffic.  A separate host is running radium and pulls the flows off of the sensor by connecting to tcp 561.  Top shows radium running at 190% CPU most of the time. 
>  
> If I connect any of the ra clients to radium (such as ra –S radium:561), flows will appear for 10-30 seconds and then pause for 30-60 seconds.  If I connect the ra clients directly to the remote argusd instance, they work fine.  We’ll be deploying argus in a second data center soon, so we’d really like to take advantage of radium’s ability to dedup flows.
>  
> Radium’s memory usage slowly climbed whether an ra client was connected or not.
>  
> I tried commenting out the two RADIUM_CLASSIFIER settings and restarted radium.  Our label file is something like 1500 lines long, so I thought that could be causing problems.    Radium uses about 30% less CPU and memory stays at 0.8%.  The intermittent pauses still happen though.
>  
> I then tried setting RADIUM_CLASSIFIER=no instead of commenting it out and the CPU went back up by 30% and the memory usage climbed steadily with no ra clients connected.  Does that not disable labeling in radium?
>  
> I’m not sure how to diagnose it any further.  My argus.conf and radium.conf are in the spreadsheet I sent you earlier.  Let me know what I can do to help diagnose this further.
>  
> Thanks.
> 
> Craig
>  
>  
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130126/6bc3b43f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2589 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130126/6bc3b43f/attachment.bin>
    
    
More information about the argus
mailing list