New To Argus

Carter Bullard carter at qosient.com
Fri Feb 22 14:12:52 EST 2008


Hey Nick,
The argus project from the very beginning has been trying
to get people away from capturing packets, and instead
capturing comprehensive flow records that account for every
packet on the wire.  This is because capturing packets at modern
speeds seems impractical, and there are a lot of problems that can
be worked out without all that data.

So to use argus in the way you want to use argus is a bit of a
twist on the model.  But I like twists ;o)print

 >>> To start out with something simple I want to be able to count the  
number of flows over TCP port 25.

The easiest way to do that right now is to do something like this in  
bash:

    % for i in pcap*; do argus -r $i -w - - tcp and port 25 | \
         rasplit -M time 5m -w - argus.data/%Y/%m/%d/argus.%Y.%m.%d.%H. 
%M.%S ; \
         done

That will put the tcp:25  "micro flow" argus records into a manageable
set of files.  Now the files themselves need to be processed to
get the flows merged together:

    % racluster -M replace -R argus.data

So now you'll get the data needed to ask questions, split into 5m bins,
so to speak.  Changing the "5m" to "1h", "4h", or "1d", may generate
file structures that you can work with, but eventually you will hit a  
memory
wall. Without doing something clever.

Now that you have these intermediate files, in order to merge the
tcp flows that span multiple files, you will need to give racluster()
a different aggregation strategy than the default.  Try a
racluster.conf file that contains these lines against the argus files
you have.

------- start racluster.conf ---------

filter="tcp and ((syn or synack) and ((fin or finack) or reset))"   
status=-1 idle=0
filter="" model="saddr daddr proto sport dport"

------- end racluster.conf --------

What this will do is:
    1. any tcp connection that is complete, where we saw the beginning  
and the
        end, just pass it through, don't track anything.
    2. any partial tcp connection, track and merge records that match.

So it only allocates memory for flows that are 'continuation' records.
The output is unsorted, so you will need to run rasort() if you want
to do any time oriented operations on the output.

In testing this, I found a problem with parsing "-1" from the status
field in some weird conditions, so I fixed it.  Grab the newest
clients from the dev directory if you want to try this method.

ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.0.rc.69.tar.gz

Give that a try, and send email to the list with any kind of result
yiou get.

With so many pcap files, we probably need to make some other
changes.

The easiest way for you to do what you eventually want do,
would be for you to say something like this:
    argus -r * -w - | rawhatever

This current won't work, and there is a reason, but maybe we
can change it.  Argus currently can read multiple input files, but you
need to specify each file using a "-r filename -r filename " like  
command
line list.   With 1000's of files, that is somewhat impractical.  It  
is this
way on purpose, because argus really does need to see packets in time  
order.

If you try to do something like this:

    argus -r * -w - | rasplit -M time 5m -w argus.out.%Y.%m.%d.%H.%M.%S

which is designed generate argus record files that represent packet
behavior with hard cutoffs every 5 minutes, on the hour;    if the
packet files are not read in time order, you get really weird
results.  It's as if the realtime argus was jumping into the future and
then into the past and then back to the future again.

Now, if you name your pcap files so they can be sorted, I can
make it so "argus -r *" can work.  How do you name your pcap files?


Because argus has the same timestamps as the packets in your
pcap files, the timestamps can be used as an "external key" if
you will.  If you build a database that has tuples (entries) like:

    "pcap_filename start_time end_time"

then by looking at a single argus record, which has a start time
and an end time, you can  find the pcap files that contain its packets.
And with something like perl and tcpdump or wireshark, you can
feed a simple shell to look in those pcap files looking for packets
with this type of filter:

    ( ether host $smac and $dmac) and (host $saddr and $daddr) and  
ports \
    ($sport and $dport)

and you get all the packets that are referenced in the record.


Carter




On Feb 21, 2008, at 4:49 PM, Nick Diel wrote:

> I am new to Argus, but have found it has great potential for the  
> research project I work on.  We collect pcap files from several high  
> traffic networks (20k-100k packets/second).  We collect for  
> approximately 12 hours and have ~1000 pcap files that are roughly  
> 500MB each.
> I am wanting to do a number of different flow analysis and think  
> Argus might be perfect for me.  I am having a hard time grasping  
> some of the fundamentals of Argus, but I think once I get some of  
> the basics I will be able to really start to use Argus.
>
> To start out with something simple I want to be able to count the  
> number of flows over TCP port 25.  I know I need to use RACluster to  
> merge the Argus output (I have one argus file for each pcap file I  
> have),  that way I can combine identical flow records into one.  I  
> can do this fine on one argus output file, but I know many flows  
> span the numerous files I have.  I also know I can't load all the  
> files at once into RACluster as it fills all available memory.  So  
> my question is how can I accomplish this while making sure I capture  
> most flows that span multiple files.
>
> Once I understand this, I hope to be able to do things like create a  
> list of flow sizes (in bytes) for port 25.  Basically I will be  
> asking a lot of questions involving all flows that match a certain  
> filter and I am not sure how to accommodate for flows spanning  
> multiple files.
>
> A separate question.  I don't think Argus has this ability, but I  
> wanted to know if the community already had a utility for this.  I  
> am looking into creating a DB of some sort that would match Argus's  
> flow IDs to pcap file name(s) and packet numbers.  This way one  
> could extract the packets for a flow that needed further  
> investigation.
>
> And finally, thanks for the great tool.  It does a number of things  
> I have been doing manually for a while.
>
> Thanks,
> Nick
>
>



More information about the argus mailing list