n00b Questions

Carter Bullard carter at qosient.com
Fri Aug 28 10:56:26 EDT 2009


Hey John,
Historically, this list has been pretty quiet as to how they are doing  
particular
things.  So you may not get a lot of responses.  Hopefully I can help.

Most universities and corporations that run argus, use it along with  
snort, or some
other type of IDS at their enterprise border.  They use the IDS as  
their up-front
security sensor, and argus as the "cover your behind" technology.    
The two basic
strategies are to  keep all their argus data to support historical  
forensics or toss
it after looking at the IDS logs and seeing that not much is/was  
happening.

The first approach is usually chosen by sites that have technically  
advanced
security personnel, that have been seriously attacked or for some  
reason have
a real awareness of the issues and know that the commercial IDS/IPS  
market is
lacking.  For sites that are under funded or are less technically  
oriented, argus,
or argus like strategies usually aren't being used.  If these types of  
sites are using
flow data, its almost always Netflow data and they are using a  
commercial report
generator to give  the data some utility.  These strategies normally  
do not store
significant amounts of flow data, as that would be a cost to the  
customer.

So when a site does collect a lot of flow data,  they generally  
partition the
data for scaling (like you are doing).  For universities/small  
corporations, they
generate argus data in the subdomain/workgroups/dorms, where 500GB can
store a years worth of flow data.

When the point of collection is the enterprise boundary, and a site is  
really using
the data, and justifying the expense of collecting it all, the site  
invests in storage,
but they also do a massive amount of preprocessing to get the data  
load down.

Most sites generate 5m-1h files.  We recommend 5 minutes.  Most sites  
run racluster()
with the default settings on their files, sometime early in the  
process, and then
gzip the files.  Just running racluster() with the default parameters  
will usually
reduce a particular file by 50-70%.    I took yesterdays data from one  
of my
small workgroups, clustered it and compressed it and got these listings:

    thoth:tmp carter$ ls -lag data*
    -rw-r--r--  1 wheel  93096940 Aug 28 10:30 data
    -rw-r--r--  1 wheel  12534420 Aug 28 10:34 data.clustered
    -rw-r--r--  1 wheel   2781879 Aug 28 10:30 data.clustered.gz

So, from 93 MB to 2MB is pretty good.  Reading these gzip'd files  
performs
pretty well, but if you are going to processing them repeatedly, then  
delaying
compression for a few days is the norm.

Because searching 100's of GB of primitive data is not very gratifying  
if you're
looking for speed,  almost all big sites process the data as it comes  
in to
generate "derived views" that are their first glance tables, charts  
and information
systems.  After creating these "derived views" some sites toss the  
primitive data
(the data from the probes).  For billing or quota system verification,  
most sites
generate the daily reports, and retain the aggregated/processed argus  
records,
and throw away the primitive data.  I've seen methods that toss,  
literally, 99.8%
of the data within the first 24 hours, and still retain enough to do a  
good job on
security awareness.

There was a time where scanning traffic generated most of the flow  
data (> 40%).
That has shifted in the last 3-4 years, but we have filters that can  
very quickly
remove data to your dark address space and split to other  
directories.  Some sites
use the data, many sites toss it.

Some sites want to track their IP address space, because they have  
found that that
is important to them, some want to retain  flow records only for "the  
dorms".  The
argus-clients package has programs to help facilitate all of this, but  
you need to
figure out what will work for you.

I'm aware that my response may not answer your questions, but just
keep asking away, and maybe there will be an answer in there that you  
can use.

In terms of what kind of hardware to get?  Well, what's wrong with  
what you're using?

Carter


On Aug 28, 2009, at 2:53 AM, John Kennedy wrote:

> While reading the argus website for System Auditing, it got me  
> thinking; With multiple ways to collect analyze and store Argus  
> data, I am curious how some have tackled the collection, processing,  
> management and storage of it?  I am always curious when it comes to  
> how others do it because like programming there is almost always  
> more than one way to do it.  I would also like to find out if there  
> are ways in which I could be more efficient.
>
> I use argus strictly for Network Security Monitoring.  In an  
> ArcSight webinar I attended the other day the presenter said "Your  
> business paints a picture everyday... is anyone watching" For me,  
> argus helps connect the dots in order to see the picture(s).  I  
> could throw many more analogies here, but I think you get the point.
>
> It has come time for me to refresh some of the hardware that argus  
> is running on.  In order to effectively put together a proposal that  
> will meet the needs of my monitoring efforts for the enterprise, I  
> would like to understand a little about how those on this list are  
> deploying argus.
>
> For me processing the data is the hardest hurdle i have to overcome  
> each day.  The server in which I run the reporting from is on a dual  
> core processor with 2 gigs of ram and 500 Gigs of storage.  Is this  
> typical?  Retention is also an issue.  On my sensors I run argus and  
> write the data to a file. Every hour I have a script that takes the  
> file compresses it and copies it to an archive. Every 4 hours I  
> rsync it to the server.  On the server I have some scripts that  
> process the last four hours of files that were just Rsynced.  I  
> realize that I could use radium() to save files to my server;  
> however with only a 500 gig RAID it gets a little tight with 5  
> sensors. I keep archives on the sensors themselves to aid in some  
> retention. The sensors by-the-way have a 200 Gig RAID.  When I first  
> was working with argus and finding equipment to use. I was sure that  
> 500 gig would be plenty... It's 500 gig, for crying out loud.
>
> So, give a n00b some feedback.
>
> Thanks
>
> John





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090828/5e4331fd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090828/5e4331fd/attachment.bin>


More information about the argus mailing list