Order when reading pcap files

Carter Bullard carter at qosient.com
Fri Aug 15 11:55:16 EDT 2008


Hey Nick,
If you are running into problems with this scale of data, then
send email to the list.  I have methods that I use for dealing with
TB's of argus data, etc... and I think it would be very good for the
list to talk about some of the issues and solutions.  Hopefully argus
is working for you well enough that you can actually do something
with the data.

rasplit() is a powerful solution for sorting massive amounts of
data, and the more we talk/type about it, I think the better.

One thing that I haven't mentioned about multiple pcap files, is
the need to tag the argus records with distinctive source ID's,
when the pcap files come from different location/sources.
This is important when the packet files "overlap", i.e. when two
or more packet files contain packets from the same flows.  I have
programs that correlate flow records from disjoint monitors, to
generate metrics like one-way delay, loss, jitter variation mods,
etc... and the only way it can work, is to have distinct source
identifiers (use the "-e <id>" option in argus).

The ability to "sort" input packets is not hard to do, and that is  
something
that argus could do, but the more I think about it, the more I think  
this
is an issue when the pcap files are collected from different locations?
Or are the files primarily from the same interface, (so that the packets
are in time order in the files) and the files are processed out of  
order?

Hope all is most excellent,

Carter

On Aug 15, 2008, at 11:24 AM, Nick Diel wrote:

> David,
>
> I wanted to throw out another idea.  Now Carter's solution is quite  
> elegant and pure Argus; I wanted to throw out another possibility  
> that might be beneficial if you are running into memory problems  
> (else I personally prefer Carter's solution).
>
> Use mergecap (part of the Whireshark family) to combine the files  
> together in correct order and feed it to argus.
>
> mergecap -w - /pcapDir/* | argus -r - -w MyData.argus
>
> Since you would be using this solution when you are running into  
> memory problems (the pcaps would be quite large compared to system  
> memory), mergecap will be slow.  Though I think the tradeoff a more  
> simple approach by you and let the system do the work.
>
> For me, Argus and it's clients can be a little difficult when you  
> are short on memory.  I just feed Argus as much memory as I can get  
> (I know have a box with 32gb of RAM).  But I collect terabytes of  
> pcaps.
>
> Nick
>
> PS  I am constantly feeding groups of pcaps to Argus, though I just  
> make sure they are named so lexicographical = chronological order.   
> You might not have that luxury.
>
> On Fri, Aug 15, 2008 at 8:28 AM, Carter Bullard <carter at qosient.com>  
> wrote:
> Hey David,
> Argus doesn't require packets to be in order to do its flow tracking,
> but it is best for the flow cache flushing logic to see the packets  
> in some
> form of order.   Since you are running argus on each file (assuming
> each file has packets in some order), it won't be a problem.
>
> The way you are running argus, though, you will want to sort the
> resulting flow data file, to get the flow records in order, so I would
> add this after your "for" command:
>
>
>   for file in *; do argus -r $file -w MyData.argus; done
>   rasort -M replace -r MyData.argus
>
> This works pretty well until MyData.argus gets bigger than memory,
> then it will perform very poorly.   If you run into this problem,
> you can use rasplit() to get the data into a series of time ordered  
> files,
> which then can be sorted independently.   Try this, if your  
> interested:
>
>   argus -r * -w -  |    rasplit -M time 5m \
>        -w ./data/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>
> This will result in the argus records being stored in a series of
> argus files that are 5 minutes long, organized by year, month
> and day.    There is no guarantee that within each file, the records
> will be in order, but the data is now organized in a file system that
> is "grossly" sorted.
>
> All the ra* programs can recursively descend these types of archive
> directories, and process the data in date order, based on the  
> filename.
> This allows you to sort the entire set, with a single command.
>
>   rasort -R ./data -M replace
>
> This will cause rasort() to sort each file in place, which hopefully  
> will
> be doable on your machine.    Then, to get all of the flow data into a
> single sorted file, you would do:
>
>   ra -R ./dir -w MyData.argus
>
>
> No IRC channel that I am aware of.
>
> Hopefully this is helpful, if it raises more questions, just send  
> more email!!!!
>
> Carter
>
>
> On Aug 15, 2008, at 9:17 AM, David wrote:
>
> I read in a whole bunch of pcap files using argus, like so:
>
> $ for file in *; do argus -r $file -w MyData.argus; done
>
> However, these aren't guaranteed to be in date order.  Will that  
> screw up argus at all?  If so, I can get an ordered list and read in  
> properly, just wondering.
>
> Also, is there an IRC channel for argus?
>
> Regards,
>
> David
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20080815/7beca691/attachment.html>


More information about the argus mailing list