n00b Questions

Fri Aug 28 22:24:05 EDT 2009

Hey John,
Did you mean 84Mbps(ec) or 84Mbph(our)?   I do hope its per second, as  
you're
generating may too much argus data for 84Mbph.

So we normally don't store data on high performance sensors, but if  
its working for
you great.  Your server can connect to your sensors, with radium(),  
and on the server
you use rasplit() to write the data into 5 minutes data files.  You  
can process the 5
minutes files 10-20 seconds after each 5 minute boundary (12:05:10,  
12:10:10, ...)
assuming your sensors are time sync'd.

You should be processing every 5 mintues, not hourly, as you don't  
have enough
ram to process 2.5 GByte argus files.  You are probably swapping with  
these large
files and that will really suck.  You can process the 5 minute files,  
and write aggregated
data (like top talker data) as 5 minute files. You can then process  
the 12 5-minute files
to generate hourly report files, and the 24 hourly files to generate  
daily top talkers.

This goes MUCH faster.

So what kind of stuff is going on, and what do you want to do with it?

Carter

On Aug 28, 2009, at 5:28 PM, John Kennedy wrote:

> My average bandwidth today is 84 Meg per hour. During peak times it  
> can get up to 120 Meg.  In 40 min I have 1.5 gig and in an hour I  
> will see about 2.3 - 2.5 gig of "primitive" data or more.
>
> So in 24 hours I will see about 60 gig of primitive data.  Pumping  
> just this one sensor's data to my server, which I described before,  
> cannot handle a Daily Top Talker Aggregation.  I have to chunk  
> reports up by hour to keep the processor available for other reports.
>
> You can see that just in a week I would store close to 420 Gigs of  
> primitive data.  Processing a day's primitive data ( from one  
> sensor ) tends to bring it to it's knees just running racluster. Now  
> throw in 4 other sensors ( one of which sees almost as much  
> traffic ) and that is a lot of raw data to process and store.
>
> Clearly I need to better utilize the tools to efficiently process  
> the raw data.  Hence the email to see how others may do it.
>
> Regards,
>
> John
>
> On Fri, Aug 28, 2009 at 8:56 AM, Carter Bullard <carter at qosient.com>  
> wrote:
> Hey John,
> Historically, this list has been pretty quiet as to how they are  
> doing particular
> things.  So you may not get a lot of responses.  Hopefully I can help.
>
> Most universities and corporations that run argus, use it along with  
> snort, or some
> other type of IDS at their enterprise border.  They use the IDS as  
> their up-front
> security sensor, and argus as the "cover your behind" technology.    
> The two basic
> strategies are to  keep all their argus data to support historical  
> forensics or toss
> it after looking at the IDS logs and seeing that not much is/was  
> happening.
>
> The first approach is usually chosen by sites that have technically  
> advanced
> security personnel, that have been seriously attacked or for some  
> reason have
> a real awareness of the issues and know that the commercial IDS/IPS  
> market is
> lacking.  For sites that are under funded or are less technically  
> oriented, argus,
> or argus like strategies usually aren't being used.  If these types  
> of sites are using
> flow data, its almost always Netflow data and they are using a  
> commercial report
> generator to give  the data some utility.  These strategies normally  
> do not store
> significant amounts of flow data, as that would be a cost to the  
> customer.
>
> So when a site does collect a lot of flow data,  they generally  
> partition the
> data for scaling (like you are doing).  For universities/small  
> corporations, they
> generate argus data in the subdomain/workgroups/dorms, where 500GB can
> store a years worth of flow data.
>
> When the point of collection is the enterprise boundary, and a site  
> is really using
> the data, and justifying the expense of collecting it all, the site  
> invests in storage,
> but they also do a massive amount of preprocessing to get the data  
> load down.
>
> Most sites generate 5m-1h files.  We recommend 5 minutes.  Most  
> sites run racluster()
> with the default settings on their files, sometime early in the  
> process, and then
> gzip the files.  Just running racluster() with the default  
> parameters will usually
> reduce a particular file by 50-70%.    I took yesterdays data from  
> one of my
> small workgroups, clustered it and compressed it and got these  
> listings:
>
>    thoth:tmp carter$ ls -lag data*
>    -rw-r--r--  1 wheel  93096940 Aug 28 10:30 data
>    -rw-r--r--  1 wheel  12534420 Aug 28 10:34 data.clustered
>    -rw-r--r--  1 wheel   2781879 Aug 28 10:30 data.clustered.gz
>
> So, from 93 MB to 2MB is pretty good.  Reading these gzip'd files  
> performs
> pretty well, but if you are going to processing them repeatedly,  
> then delaying
> compression for a few days is the norm.
>
> Because searching 100's of GB of primitive data is not very  
> gratifying if you're
> looking for speed,  almost all big sites process the data as it  
> comes in to
> generate "derived views" that are their first glance tables, charts  
> and information
> systems.  After creating these "derived views" some sites toss the  
> primitive data
> (the data from the probes).  For billing or quota system  
> verification, most sites
> generate the daily reports, and retain the aggregated/processed  
> argus records,
> and throw away the primitive data.  I've seen methods that toss,  
> literally, 99.8%
> of the data within the first 24 hours, and still retain enough to do  
> a good job on
> security awareness.
>
> There was a time where scanning traffic generated most of the flow  
> data (> 40%).
> That has shifted in the last 3-4 years, but we have filters that can  
> very quickly
> remove data to your dark address space and split to other  
> directories.  Some sites
> use the data, many sites toss it.
>
> Some sites want to track their IP address space, because they have  
> found that that
> is important to them, some want to retain  flow records only for  
> "the dorms".  The
> argus-clients package has programs to help facilitate all of this,  
> but you need to
> figure out what will work for you.
>
> I'm aware that my response may not answer your questions, but just
> keep asking away, and maybe there will be an answer in there that  
> you can use.
>
> In terms of what kind of hardware to get?  Well, what's wrong with  
> what you're using?
>
> Carter
>
>
> On Aug 28, 2009, at 2:53 AM, John Kennedy wrote:
>
>> While reading the argus website for System Auditing, it got me  
>> thinking; With multiple ways to collect analyze and store Argus  
>> data, I am curious how some have tackled the collection,  
>> processing, management and storage of it?  I am always curious when  
>> it comes to how others do it because like programming there is  
>> almost always more than one way to do it.  I would also like to  
>> find out if there are ways in which I could be more efficient.
>>
>> I use argus strictly for Network Security Monitoring.  In an  
>> ArcSight webinar I attended the other day the presenter said "Your  
>> business paints a picture everyday... is anyone watching" For me,  
>> argus helps connect the dots in order to see the picture(s).  I  
>> could throw many more analogies here, but I think you get the point.
>>
>> It has come time for me to refresh some of the hardware that argus  
>> is running on.  In order to effectively put together a proposal  
>> that will meet the needs of my monitoring efforts for the  
>> enterprise, I would like to understand a little about how those on  
>> this list are deploying argus.
>>
>> For me processing the data is the hardest hurdle i have to overcome  
>> each day.  The server in which I run the reporting from is on a  
>> dual core processor with 2 gigs of ram and 500 Gigs of storage.  Is  
>> this typical?  Retention is also an issue.  On my sensors I run  
>> argus and write the data to a file. Every hour I have a script that  
>> takes the file compresses it and copies it to an archive. Every 4  
>> hours I rsync it to the server.  On the server I have some scripts  
>> that process the last four hours of files that were just Rsynced.   
>> I realize that I could use radium() to save files to my server;  
>> however with only a 500 gig RAID it gets a little tight with 5  
>> sensors. I keep archives on the sensors themselves to aid in some  
>> retention. The sensors by-the-way have a 200 Gig RAID.  When I  
>> first was working with argus and finding equipment to use. I was  
>> sure that 500 gig would be plenty... It's 500 gig, for crying out  
>> loud.
>>
>> So, give a n00b some feedback.
>>
>> Thanks
>>
>> John
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090828/68b460ad/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090828/68b460ad/attachment.bin>