racluster memory utilization

Carter Bullard carter at qosient.com
Fri May 23 00:17:23 EDT 2014


Hey Jason,
You really need to look at your data to see how it can best be aggregated.
Do you have a lot of DNS and UDP ??  If not then your approach won’t help you
much as you default to standard aggregation for everything else, and hold
the transactions way too long.

If yes you have a lot of DNS, then your approach won’t help you much, as
you’re not doing any real aggregation as you specify the standard 5-tuple
aggregation model.

Does this:

   racluster -r < filelist > -m saddr daddr proto dport -w /tmp/dns.nsport.out - udp and port domain
   racount -r /tmp/dns.nsport.out

Go faster than

   racluster -r <filelist > -w /tmp/dns.out - udp and port domain
   racount -r /tmp/dns.out

???  If so, then there is a good candidate for your complex racluster.conf.

Write the output to a file, so you can inspect the output for relevance, correctness,
all the *nesses that will tell you if any form of client / server aggregation
for DNS buys you anything.

Try tuning down your idle time to 10-30 seconds…
Carter


On May 22, 2014, at 7:46 PM, Jason <dn1nj4 at gmail.com> wrote:

> (Changing the subject to be relevant to the current conversation)
> 
> Carter,
> 
> Based on your suggestions below, with 3.0.7.28, I conducted 2 tests against 50GB of flow files with the following:
> 
> racluster -r <filelist> -i -nn -c"," -m srcid saddr daddr proto dport -Zb -s stime saddr daddr proto sport dport sbytes runtime dbytes trans state 
> 
> On a beefy server, I let this run for 70 minutes before killing it.  In that time it consumed 33GB of RAM.
> 
> Next, I added the "-f racluster.conf" option with the following configuration: 
> 
> filter="udp and port domain" model="saddr daddr proto sport dport" status=0 idle=10
> filter="udp" model="saddr daddr proto sport dport" status=0 idle=60
> filter="" model="saddr daddr proto sport dport" status=0 idle=600
> 
> This version (which I was expecting to consume less memory based on previous list threads) I killed after 53 minutes with it consuming 39GB of RAM (read: less time, more RAM).
> 
> So even with your suggested changes, the amount of RAM utilization still seems really high.  Are there changes I should make to the racluster.conf file to reduce the memory footprint further?  Do you have any kind of statistics correlating volume of flow data to volume of memory utilization?
> 
> I know you mentioned rasqlinsert, but my performance testing for trying to process another large batch of files files indicated the processing probably would not finish before the next batch needed to be processed.  So I'm thinking that's not really a viable option.
> 
> Appreciate all the help.
> Jason
> 
> On Thu, May 22, 2014 at 2:33 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Jason,
> So you want to do service based tracking on an IP address basis,
> but you want to track client and server oriented stats.
> 
> Once you say that you want to track directionality, then
> the “-M rmon” option is not the correct tool.
> 
> Tracking single IP addresses and all the ports that they offer
> is a great way to go, and “-M rmon” is a good way to do that.
> 
> What do you get with "racluster -m srcid smac saddr sport” ?
> You get the ethernet, IP address pairings, and any port that
> is used on that IP address.  This information can answer your
> server questions.  If the port of interest is in the output,
> it was used on the that IP address, if there are lots of connections,
> with traffic, then you may be able to infer that it is a server
> for that port, but it is not definitive.
> 
> You should use straight racluster() with a filter that
> assures that your port operations are valid.
> 
>    racluster -m srcid saddr daddr proto dport -r file - \(syn or synack\)
> 
> This will give you TCP flow records where the dport is the service port.
> You will end up with a list of records that are:
> 
>    client -> server.serverPort metrics
> 
> 
> You should get yourself a good racluster.conf file and do a decent job
> on defining a cluster scheme that really works.
> 
> Carter
> 
> 
> On May 22, 2014, at 12:58 PM, Jason <dn1nj4 at gmail.com> wrote:
> 
>> Let me clarify and provide a bit more context...  I expect the following flows: 
>> 
>> 1.2.3.4:23456 -> 5.6.7.8:34567
>> 1.2.3.4:45678 -> 6.7.8.9:34567
>> 
>> To result in the following output data: 
>> 
>> 1.2.3.4 23456 34567 
>> 1.2.3.4 45678 34567 
>> 5.6.7.8 34567 23456 
>> 6.7.8.9 34567 45678 
>> 
>> ((in addition to various other stats aggregated with the saddr,sport,dport fields as the key))
>> 
>> I'm then taking the above data and doing simplistic port groupings, such as "34567 is (typically) part of the app1 port group" (think 80, 8000, 8080 as typically "web").  Then I generate a report that says: 
>> 
>> 1.2.3.4, client to the app1 port group, X bytes from this client, Y bytes to this client, Z connections from this client
>> 
>> 5.6.7.8, server for the app1 port group, X bytes from this server, Y bytes to this server, Z connections to this server
>> 
>> 6.7.8.9, server for the app1 port group, X bytes from this server, Y bytes to this server,Z connections to this server
>> 
>> This is a gross oversimplification, but is there a better way to do the above?
>> 
>> Thanks!
>> Jason
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140523/0dcdc146/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140523/0dcdc146/attachment.sig>


More information about the argus mailing list