[flow-tools] performance question [resend]

Craig A. Finseth fin@finseth.com
Fri, 24 Jan 2003 10:12:50 -0600 (CST)


   > 2) Over 50% of the total wall clock time is spent in flow-nfilter.
   > This step involves producing a flow file for each customer that
   > contains only flows to or from that customer.  The source of data for
   > this step is the set of flow files that contains data from all parts
   > of the network merged together.  The CPU is about 40% busy during this
   > step.  A typical filter file is:

       So where's the bottleneck on this one?  Disk?  Memory?  

Good question.  I will keep an eye on my performance tools during the
next run.

On a follow-up note, I wrote alternate code for flow-tag.  On a
roughly 30 MB flow file, I gathered the following stats:

- original flow-tag, trivial tag file (one entry)
- original flow-tag, full tag file
- alternate flow-tag, trivial tag file (one entry)
- alternate flow-tag, full tag file

The results:

			trivial		full

	original	1.5sec		4:15
	alternate	1.4sec		  15sec

The trivial files show what time is required to read the data, write it,
and handle all other overhead (except for tag file loading).

My alternate implementation[*] shows that most of the time was spent
evaluating the tags.  This problem has been addressed.

[*] The alternate version is as general as reasonable but still specific
to my needs.

I will be happy to supply my changes if someone will tell me where to
send them.

Craig