Graph of the Week at http://qosient.com/argus
carter at qosient.com
carter at qosient.com
Thu Sep 21 07:20:36 EDT 2006
Hey Bammkkkk,
I looked at your use examples (tunnel detection) and we've been doing this type of analysis for over 10 yrs now ( b yte ratio deviations, etc ... ). But you don't have to have the data in a database to do this type of analysis. I admit that having the flexibility that SQL provides is great, and helps you do some mining, but if you want to find this exploitation in real time, a stream processor is a better approach than "load a days worth of data into the database, and then do a SQL call" (not criticizing by any means).
And I agree, the database problem with large record numbers is both performance and management!!!
General anomaly detection strategies are great, and you willl find success down that path, but for problems like covert/overt channel exploits, you need better data than simple flow data.
Argus approaches tunnel detection by having some user data in the flow record, so you can actually see what protocol is presented in the flow, and for some protocols, protocol specific objects, like sequence number are used as a part of the flow key, which makes detecting anomalous behavior, like DNS tunnel detection much easier.
Gotta go, but keep the thread going if you find it useful!!!
Carter
Carter Bullard
QoSient LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax
-----Original Message-----
From: "Bamm Visscher" <bamm.visscher at gmail.com>
Date: Tue, 19 Sep 2006 09:57:37
To:carter at qosient.com
Cc:"Richard Bejtlich" <taosecurity at gmail.com>, "Olaf Gellert" <olaf.gellert at intrusion-lab.net>, Argus <argus-info at lists.andrew.cmu.edu>
Subject: Re: [ARGUS] Graph of the Week at http://qosient.com/argus
I don't have a network where I can test SANCP and the 1 million plus
records/hour. About the most I have dealt w/personally is ~15
million/day. I don't think getting the data into the DB is the tough
part. MySQL (MyISAM) is pretty quick at loading it assuming your HW
can handle it (although I can understand concerns about this being
"safe").
Management of the DB with that many records can be a nightmare though.
At some point you are probably going to hit a HW or SW wall, whether
it be limited disk space or mysql performance and that is why we
started using MERGE tables. We've found the performance is acceptable
(I have users with over 1 billion </austinpowers> records in their
sancp tables) and management of the data is improved immensely. For
example, say you hit some HW or SW limit and needed to remove 30 days
or millions of records of data. With a non-MERGE DB, you would have to
perform a DELETE FROM foo WHERE bar (and maybe a CREATE TABLE foo ...
SELECT bar if you want to archive). The DELETE and CREATE...SELECT
might not be too bad but you will have to analyze and optimize the
table to gain anything and this can lock the table/DB for an
unexceptably long time. With MERGE you can simple delete or move the
specific tables (sancp_sensor_20060101), restart mysql, and restart
sguild letting it quickly rebuild a new MERGE definition.
So, why do we want flow data in a DB anyway? For starters, it gives
us a centralized location to query flow data from multiple sensors.
It's a simple query if I want to know if any systems from any of the
networks I monitor made a connection out to bad.ip.org. We also get
the power of SQL (did I just right that) to do some data mining. Here
are some examples of stuff users are doing:
http://www.inliniac.net/blog/?p=24
http://infosecpotpourri.blogspot.com/2006/06/tracking-your-most-active-network.html
http://infosecpotpourri.blogspot.com/2006/05/traffic-analysis-approach-to-detecting.html
I should also point out that I am not sure how well SANCP will scale
in large bandwidth environments. A good friend of mine wrote it,
mainly out of my request, but I know he also uses it in his own work.
I cc'd him on this reply, maybe he has more insight.
Finally, I really don't have a lot of experience with Argus. I am
going to subscribe to the mailing lists after I send this out. I've
talked with Rich and others about how best to use/access Argus data
from the Sguil interface. We just haven't been able to come up with a
final solution. Do we simply provide a hook for the analyst to easily
run the ra client and grok data associated with a highlighted alert?
Do we take the time to parse the data and put it into a DB (on the
sensor or centralized)? Do you have any comments on how you use
Argus in correlation with security events?
Final, finally ;) I apologize if this email seems to bounce around, I
didn't want to wait to reply so I've been composing it in between
various other projects and my wife is convinced I don't multi-task
well.
Bammkkkk
On 9/19/06, carter at qosient.com <carter at qosient.com> wrote:
> Hey Richard,
> Binary files are more efficient than ascii, so when generating/transporting/processing/filtering 1M+ records an hour or so, which is common for argus in large networks, we need to be efficient.
>
> We have a lot of database support in gargoyle, so the experience below is all derived from that work.
>
> So, we don't put flow records into databases, as a general rule, because there are way too many flow records, but we do put heavily aggregated flow data into databases and indexes to flow data in databases all the time. We use this strategy to help find/reference/manage the primitive/original flow data, say based on time, or address or event. When we do put actual flow data into something like mysql, we generally will have the key fields and a few attributes, whatever was "key" at the time of analysis, but then put the entire binary flow record in as a blob, so that even if the schema is an abbreviated one, say only src address and time, we can have the complete flow data available. This works very well.
>
> When we build collections of data, say for forensics analysis and reporting, which involves a little data from here, a little data from there, some enrichment and annotation and raw flow data, we usually provide all the 'derived' data, as well as any original primitive data, so the analysis can be self contained, and shipped around and worked, independent of the original data. Here, providing/supporting binary data is also very important, since some of the evidence maybe an image or program or something non-ascii.
>
> Hopefully, there is something useful in my response!!!
>
> Carter
>
>
>
> Carter Bullard
> QoSient LLC
> 150 E. 57th Street Suite 12D
> New York, New York 10022
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
> -----Original Message-----
> From: "Richard Bejtlich" <taosecurity at gmail.com>
> Date: Sun, 17 Sep 2006 05:23:42
> To:carter at qosient.com
> Cc:argus-info-bounces at lists.andrew.cmu.edu, "Olaf Gellert" <olaf.gellert at intrusion-lab.net>, Argus <argus-info at lists.andrew.cmu.edu>, "Bamm Visscher" <bamm.visscher at gmail.com>
> Subject: Re: [ARGUS] Graph of the Week at http://qosient.com/argus
>
> On 9/15/06, carter at qosient.com <carter at qosient.com> wrote:
> > Hey Richard, et al,
> > Why doesn't squil eat Argus records yet ;o)
> >
>
> Hi Carter,
>
> This is an issue we have debated. Maybe if I explain our current
> situation you can imagine a solution?
>
> Currently we use SANCP (www.metre.net/sancp.html) in the following
> manner. SANCP watches traffic and writes results to files with text
> data like the following:
>
> 1|4960894957268645250|2006-08-08 14:46:06|2006-08-08
> 14:46:16|10|6|1167053256|57239|1123635987|443|9|1469|10|2103|27|27
> 1|4960894957268571650|2006-08-08 14:46:06|2006-08-08
> 14:46:17|11|6|1167053256|57238|1123636051|443|11|5063|9|1940|27|27
>
> These records are bi-directional and generally unique for each session.
>
> A Sguil component (sensor_agent.tcl) periodically checks the directory
> into which the SANCP records are written, reads the files, and then
> inserts them into a MySQL database like the following:
>
> mysql> describe sancp;
> +------------+----------------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +------------+----------------------+------+-----+---------+-------+
> | sid | int(10) unsigned | NO | MUL | NULL | |
> | sancpid | bigint(20) unsigned | NO | | NULL | |
> | start_time | datetime | NO | MUL | NULL | |
> | end_time | datetime | NO | | NULL | |
> | duration | int(10) unsigned | NO | | NULL | |
> | ip_proto | tinyint(3) unsigned | NO | | NULL | |
> | src_ip | int(10) unsigned | YES | MUL | NULL | |
> | src_port | smallint(5) unsigned | YES | MUL | NULL | |
> | dst_ip | int(10) unsigned | YES | MUL | NULL | |
> | dst_port | smallint(5) unsigned | YES | MUL | NULL | |
> | src_pkts | int(10) unsigned | NO | | NULL | |
> | src_bytes | int(10) unsigned | NO | | NULL | |
> | dst_pkts | int(10) unsigned | NO | | NULL | |
> | dst_bytes | int(10) unsigned | NO | | NULL | |
> | src_flags | tinyint(3) unsigned | NO | | NULL | |
> | dst_flags | tinyint(3) unsigned | NO | | NULL | |
> +------------+----------------------+------+-----+---------+-------+
> 16 rows in set (0.02 sec)
>
> I guess we grapple with Argus for a few reasons. One, support for
> SANCP is built into Sguil. We haven't built an API to accept other
> data sources, although Bamm is considering it. When an API is in
> place (maybe Sguil 2.0?) we would aim for accepting Argus, NetFlow,
> etc.
>
> Two, we're not sure how best to accommodate Argus' record creation
> model, where data is written to a non-text format with potentially
> multiple records for the same session. Do we let Argus write records,
> run ra against them, output to a text file, and then parse the results
> for insertion into the database? Or do we avoid a db entirely and
> have Sguil invoke ra against Argus records?
>
> In any case we would appreciate insights on how best to accommodate
> Argus with Sguil, since obviously several of us use Argus alongside
> Sguil components already.
>
> Thank you,
>
> RIchard
>
>
--
sguil - The Analyst Console for NSM
http://sguil.sf.net
More information about the argus
mailing list