Graph of the Week at http://qosient.com/argus

Tue Sep 19 07:39:27 EDT 2006

Hey Richard,
Binary files are more efficient than ascii, so when generating/transporting/processing/filtering 1M+ records an hour or so, which is common for argus in large networks, we need to be efficient.

We have a lot of database support in gargoyle, so the experience below is all derived from that work.

So, we don't put flow records into databases, as a general rule, because there are way too many flow records, but we do put heavily aggregated flow data into databases and indexes to flow data in databases all the time.  We use this strategy to help find/reference/manage the primitive/original flow data, say based on time, or address or event.  When we do put actual flow data into something like mysql, we generally will have the key fields and a few attributes,  whatever was "key" at the time of analysis, but then put the entire binary flow record in as a blob, so that even if the schema is an abbreviated one, say only src address and time, we can have the complete flow data available.  This works very well.

When we build collections of data, say for forensics analysis and reporting, which involves a little data from here, a little data from there, some enrichment and annotation and raw flow data, we usually provide all the 'derived' data, as well as any original primitive data, so the analysis can be self contained, and shipped around and worked, independent of the original data.  Here, providing/supporting binary data is also very important, since some of the evidence maybe an image or program or something non-ascii.

Hopefully, there is something useful in my response!!!

Carter

Carter Bullard
QoSient LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax  

-----Original Message-----
From: "Richard Bejtlich" <taosecurity at gmail.com>
Date: Sun, 17 Sep 2006 05:23:42 
To:carter at qosient.com
Cc:argus-info-bounces at lists.andrew.cmu.edu, "Olaf Gellert" <olaf.gellert at intrusion-lab.net>, Argus <argus-info at lists.andrew.cmu.edu>, "Bamm Visscher" <bamm.visscher at gmail.com>
Subject: Re: [ARGUS] Graph of the Week at http://qosient.com/argus

On 9/15/06, carter at qosient.com <carter at qosient.com> wrote:
> Hey Richard, et al,
> Why doesn't squil eat Argus records yet ;o)
>

Hi Carter,

This is an issue we have debated.  Maybe if I explain our current
situation you can imagine a solution?

Currently we use SANCP (www.metre.net/sancp.html) in the following
manner.  SANCP watches traffic and writes results to files with text
data like the following:

1|4960894957268645250|2006-08-08 14:46:06|2006-08-08
14:46:16|10|6|1167053256|57239|1123635987|443|9|1469|10|2103|27|27
1|4960894957268571650|2006-08-08 14:46:06|2006-08-08
14:46:17|11|6|1167053256|57238|1123636051|443|11|5063|9|1940|27|27

These records are bi-directional and generally unique for each session.

A Sguil component (sensor_agent.tcl) periodically checks the directory
into which the SANCP records are written, reads the files, and then
inserts them into a MySQL database like the following:

mysql> describe sancp;
+------------+----------------------+------+-----+---------+-------+
| Field      | Type                 | Null | Key | Default | Extra |
+------------+----------------------+------+-----+---------+-------+
| sid        | int(10) unsigned     | NO   | MUL | NULL    |       |
| sancpid    | bigint(20) unsigned  | NO   |     | NULL    |       |
| start_time | datetime             | NO   | MUL | NULL    |       |
| end_time   | datetime             | NO   |     | NULL    |       |
| duration   | int(10) unsigned     | NO   |     | NULL    |       |
| ip_proto   | tinyint(3) unsigned  | NO   |     | NULL    |       |
| src_ip     | int(10) unsigned     | YES  | MUL | NULL    |       |
| src_port   | smallint(5) unsigned | YES  | MUL | NULL    |       |
| dst_ip     | int(10) unsigned     | YES  | MUL | NULL    |       |
| dst_port   | smallint(5) unsigned | YES  | MUL | NULL    |       |
| src_pkts   | int(10) unsigned     | NO   |     | NULL    |       |
| src_bytes  | int(10) unsigned     | NO   |     | NULL    |       |
| dst_pkts   | int(10) unsigned     | NO   |     | NULL    |       |
| dst_bytes  | int(10) unsigned     | NO   |     | NULL    |       |
| src_flags  | tinyint(3) unsigned  | NO   |     | NULL    |       |
| dst_flags  | tinyint(3) unsigned  | NO   |     | NULL    |       |
+------------+----------------------+------+-----+---------+-------+
16 rows in set (0.02 sec)

I guess we grapple with Argus for a few reasons.  One, support for
SANCP is built into Sguil.  We haven't built an API to accept other
data sources, although Bamm is considering it.  When an API is in
place (maybe Sguil 2.0?) we would aim for accepting Argus, NetFlow,
etc.

Two, we're not sure how best to accommodate Argus' record creation
model, where data is written to a non-text format with potentially
multiple records for the same session.  Do we let Argus write records,
run ra against them, output to a text file, and then parse the results
for insertion into the database?  Or do we avoid a db entirely and
have Sguil invoke ra against Argus records?

In any case we would appreciate insights on how best to accommodate
Argus with Sguil, since obviously several of us use Argus alongside
Sguil components already.

Thank you,

RIchard