Merging or aggregating data from multiple Argus collectors

Fri Feb 13 11:34:12 EST 2009

Hey Real,
For end-to-end monitoring, I suggest an argus in/or near each end-system
and a monitor at key boundaries, such as the enterprise/ISP border,
and in the best of worlds, Subnet/Campus Area Network borders.

You can get an argus for windows by compiling argus under Cygwin,
and then installing it as a service, using the Cygwin utilities.    
Distributing
that to a large number of windows hosts is not hard, as the binary
just needs a cygwin.dll to work, at least in theory.  I have done this
sucessfully, but I don't have any Windows machines myself.

In order to do, real-time observation/analysis, ra* programs can
attach to any number of data sources at a time, so when dealing with
end-to-end performance problems, most of the time, I connect to
all the probes that can "see" the traffic of interest using a radium()
that will run only for the experimental period, and then attach a  
rasplit()
to that radium() to record all the data during the experiment to a
single filesystem that split's the data out like this:

rasplit -M time 5m -S radium -w experiment/\$srcid/%Y/%m/%d/argus.%Y. 
%m.%d.%H.%M.%S

If I'm doing near-realtime analysis, I have all the clients attach to  
the
collection radium to get their data.

For persistent collection of all argus data in a domain of interest,
radium can be used to collect data from the various assets, such that  
there
are multiple radii,  with one radii collecting in each "Local" area,  
and a
"Regional" radium is connected to those "Local" radii, to establish
a "Regional" repository of flow information, if resources are available.

But the general design is to have a large number of repositories, and
to use ra* programs to fetch the data intelligently.

At each radium, I generally have a repository, that is structured such
that the primitive data is stored in directories that have at the root
the '$srcid', using programs like rasplit(), so that you have the  
records
from individual argi, or netflow sources, separated from the beginning.
This helps to eliminate any issues, such as poor time sync,  and
availability problems.

Usually I have a repository management agent, compressing files of
a certain age and deleting data that is past due.

Once you have this repository set up, you can fetch specific files out  
of
the repository on demand.  The radium that is running on the repository
machine can "serve up" argus files to remote clients.  The client  
requests
a ARGUS_FILE transfer, rather than an ARGUS_STREAM transfer.

    ra -S remoteRadium/path/to/specific/argus/file/argus. 
2009.02.13.15.20.00.gz

(this assumes the default port, the syntax is "-S host:port/path/to/ 
file/filename")

How you find out the path, is outside of the simple ra* programs, but
you can structure it so that it works pretty easily.

SO, if you need to know what's up right now, attach to either the argi  
directly,
or attach to all the radii that collect from the probes of interest,  
or the one radii
that collects from them all, if you're lucky to have a single access  
point.

If you need to know what happened Tues between 12:00-12:02, use
ra* to collect the files from all the repositories, and then locally  
process
them to do your analysis.

There are many factors in dealing with large amounts of distributed  
data.
The model that I have been designing to, uses multiple repositories that
are collection points for some N number of probes, and these  
repositories
hold 'primitive' data (the data straight from the argus or netflow  
monitors)
and generate 'derived' data summaries, indexes and reports.

I have a lot of stuff to handle the summary information that you are
describing, and I leave that for another email.

Hope this answers your question, and that  all is most excellent.

Carter

On Feb 10, 2009, at 1:37 PM, real.melancon at videotron.ca wrote:

> Hi Carter,
>
> We are in the process of looking for an end-to-end application  
> monitoring, and would like to have your opinion on the best way to  
> merge data coming from different collectors.
>
> Low bandwidth between collectors can be a factor, and we have to  
> consider it.
>
> At the end we would have a web application that would display the  
> various flows, number of connections initiated, active connections,  
> etc...
>
> Also, is there such thing as a Windows Argus service ?
>
> Thanks in advance!
>
> ____________________________
> Real Melanson

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax