New To Argus

Mon Feb 25 14:01:52 EST 2008

Carter,

First of all thanks for your detailed response and updated clients.  And 
I am glad you like twists.

Let me tell you a little bit more about the research setup.  The 
research project I am part of (made up of several universities in the 
US) has several collection boxes in different large commercial 
environments.  The boxes were customized specifically for high speed 
packet capturing (RAID, Endace capture card, etc.).  We will run a 12 
hour capture and then analyze the capture for some time.  Sometimes up 
to several months.  So I do have time to correctly create my argus 
output files and do any other processing I need to do.

Some of the researchers focus on packet based research, where as other 
parts of the group focus more on flow based analysis.  So Argus looks 
like a great match for us.  Immediately after the capture, we can create 
Argus flow records and do our flow analysis with Argus clients.

So for my first question, is Argus capable of capturing at high line 
speeds (at least 1Gbit) where doing a packet capture using libpcap and a 
standard NIC may fail (libpcap dropping packets)?  Or since Argus is 
flow based it doesn't care if it misses packets?  Some of the anomalies 
we research require us to account for almost every packet in the 
anomaly, so say dropping every 100th or even every 1000th packet could 
hamper us.  The reason I ask I about Argus high speed captures, is if it 
is very capable at high speeds, it would allow us to deploy more 
collection boxes (these boxes would then primarily be used by the flow 
based researchers).  We wouldn't have to buy an expensive capture card 
for each collection box.

As for reading multiple files into Argus, one easy way to accomplish 
this would have Argus be able to read pcap files from stdin.  Then one 
can use a utility such as mergecap or tcpslice to feed Argus a list of 
out of order files: mergecap -r /packets/*.pcap -w - | argus -r - ....

My files are named so chronological order equals lexical order so argus 
-r * would work in my case (this helps us with a number of utilities we 
use).  I do understand actually implementing this in Argus would require 
probably a number of things such as dieing when files are out of order 
and then telling the user what order argus was reading the files.  
Though doing this would be quite faster then having tcpslice or mergecap 
feed Argus the pcap files.

Now let me ask about what I have been working on (merging flows across 
argus data files).  First, if I was capturing with Argus (not reading 
pcap files, capturing off the wire: argus | raspilt) wouldn't I run into 
the same problem of having flows broken up across different argus files?

If racluster is merging records as it finds them (not reading all 
records into memory first), it seems it might be nice to specify a  
memory limit for racluster at command line.  Then as racluster 
approaches the memory limit it could remove the oldest records from 
memory and print them to the output.

I was able to use your suggestion successfully to merge most of my flows 
together.  Though I needed to make a few modifications to the filter.  I 
moved parenthesis, "tcp and ((syn or synack) and (*(*fin or finack) or 
reset*)*)" vs. "tcp and (*(*(syn or synack) and (fin or finack)*)* or 
reset)."  And I added "not con" to filter out the many, many packet 
scans, though this also does not merge syn-synack flows which exist at 
the end of the argus output files.  This filter still caused most of the 
memory to be used, but not a whole lot of time was spent in the upper 
range where swapping was slowing the system to a crawl.  Without "not 
con" I would reach the upper limits of memory usage quite fast and go 
into a crawl with the swapping.

Thanks again for all your help,
Nick

Carter Bullard wrote:
> Hey Nick,
> The argus project from the very beginning has been trying
> to get people away from capturing packets, and instead
> capturing comprehensive flow records that account for every
> packet on the wire.  This is because capturing packets at modern
> speeds seems impractical, and there are a lot of problems that can
> be worked out without all that data.
>
> So to use argus in the way you want to use argus is a bit of a
> twist on the model.  But I like twists ;o)print
>
> >>> To start out with something simple I want to be able to count the 
> number of flows over TCP port 25.
>
> The easiest way to do that right now is to do something like this in 
> bash:
>
>    % for i in pcap*; do argus -r $i -w - - tcp and port 25 | \
>         rasplit -M time 5m -w - 
> argus.data/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S ; \
>         done
>
> That will put the tcp:25  "micro flow" argus records into a manageable
> set of files.  Now the files themselves need to be processed to
> get the flows merged together:
>
>    % racluster -M replace -R argus.data
>
> So now you'll get the data needed to ask questions, split into 5m bins,
> so to speak.  Changing the "5m" to "1h", "4h", or "1d", may generate
> file structures that you can work with, but eventually you will hit a 
> memory
> wall. Without doing something clever.
>
> Now that you have these intermediate files, in order to merge the
> tcp flows that span multiple files, you will need to give racluster()
> a different aggregation strategy than the default.  Try a
> racluster.conf file that contains these lines against the argus files
> you have.
>
> ------- start racluster.conf ---------
>
> filter="tcp and ((syn or synack) and ((fin or finack) or reset))"  
> status=-1 idle=0
> filter="" model="saddr daddr proto sport dport"
>
> ------- end racluster.conf --------
>
> What this will do is:
>    1. any tcp connection that is complete, where we saw the beginning 
> and the
>        end, just pass it through, don't track anything.
>    2. any partial tcp connection, track and merge records that match.
>
> So it only allocates memory for flows that are 'continuation' records.
> The output is unsorted, so you will need to run rasort() if you want
> to do any time oriented operations on the output.
>
> In testing this, I found a problem with parsing "-1" from the status
> field in some weird conditions, so I fixed it.  Grab the newest
> clients from the dev directory if you want to try this method.
>
> ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.0.rc.69.tar.gz
>
> Give that a try, and send email to the list with any kind of result
> yiou get.
>
> With so many pcap files, we probably need to make some other
> changes.
>
> The easiest way for you to do what you eventually want do,
> would be for you to say something like this:
>    argus -r * -w - | rawhatever
>
> This current won't work, and there is a reason, but maybe we
> can change it.  Argus currently can read multiple input files, but you
> need to specify each file using a "-r filename -r filename " like command
> line list.   With 1000's of files, that is somewhat impractical.  It 
> is this
> way on purpose, because argus really does need to see packets in time 
> order.
>
> If you try to do something like this:
>
>    argus -r * -w - | rasplit -M time 5m -w argus.out.%Y.%m.%d.%H.%M.%S
>
> which is designed generate argus record files that represent packet
> behavior with hard cutoffs every 5 minutes, on the hour;    if the
> packet files are not read in time order, you get really weird
> results.  It's as if the realtime argus was jumping into the future and
> then into the past and then back to the future again.
>
> Now, if you name your pcap files so they can be sorted, I can
> make it so "argus -r *" can work.  How do you name your pcap files?
>
>
> Because argus has the same timestamps as the packets in your
> pcap files, the timestamps can be used as an "external key" if
> you will.  If you build a database that has tuples (entries) like:
>
>    "pcap_filename start_time end_time"
>
> then by looking at a single argus record, which has a start time
> and an end time, you can  find the pcap files that contain its packets.
> And with something like perl and tcpdump or wireshark, you can
> feed a simple shell to look in those pcap files looking for packets
> with this type of filter:
>
>    ( ether host $smac and $dmac) and (host $saddr and $daddr) and ports \
>    ($sport and $dport)
>
> and you get all the packets that are referenced in the record.
>
>
> Carter
>
>
>
>
> On Feb 21, 2008, at 4:49 PM, Nick Diel wrote:
>
>> I am new to Argus, but have found it has great potential for the 
>> research project I work on.  We collect pcap files from several high 
>> traffic networks (20k-100k packets/second).  We collect for 
>> approximately 12 hours and have ~1000 pcap files that are roughly 
>> 500MB each.
>> I am wanting to do a number of different flow analysis and think 
>> Argus might be perfect for me.  I am having a hard time grasping some 
>> of the fundamentals of Argus, but I think once I get some of the 
>> basics I will be able to really start to use Argus.
>>
>> To start out with something simple I want to be able to count the 
>> number of flows over TCP port 25.  I know I need to use RACluster to 
>> merge the Argus output (I have one argus file for each pcap file I 
>> have),  that way I can combine identical flow records into one.  I 
>> can do this fine on one argus output file, but I know many flows span 
>> the numerous files I have.  I also know I can't load all the files at 
>> once into RACluster as it fills all available memory.  So my question 
>> is how can I accomplish this while making sure I capture most flows 
>> that span multiple files.
>>
>> Once I understand this, I hope to be able to do things like create a 
>> list of flow sizes (in bytes) for port 25.  Basically I will be 
>> asking a lot of questions involving all flows that match a certain 
>> filter and I am not sure how to accommodate for flows spanning 
>> multiple files.
>>
>> A separate question.  I don't think Argus has this ability, but I 
>> wanted to know if the community already had a utility for this.  I am 
>> looking into creating a DB of some sort that would match Argus's flow 
>> IDs to pcap file name(s) and packet numbers.  This way one could 
>> extract the packets for a flow that needed further investigation.
>>
>> And finally, thanks for the great tool.  It does a number of things I 
>> have been doing manually for a while.
>>
>> Thanks,
>> Nick
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080225/78333085/attachment.html>