[flow-tools] performance question [resend]
Craig A. Finseth
fin@finseth.com
Fri, 24 Jan 2003 10:12:50 -0600 (CST)
> 2) Over 50% of the total wall clock time is spent in flow-nfilter.
> This step involves producing a flow file for each customer that
> contains only flows to or from that customer. The source of data for
> this step is the set of flow files that contains data from all parts
> of the network merged together. The CPU is about 40% busy during this
> step. A typical filter file is:
So where's the bottleneck on this one? Disk? Memory?
Good question. I will keep an eye on my performance tools during the
next run.
On a follow-up note, I wrote alternate code for flow-tag. On a
roughly 30 MB flow file, I gathered the following stats:
- original flow-tag, trivial tag file (one entry)
- original flow-tag, full tag file
- alternate flow-tag, trivial tag file (one entry)
- alternate flow-tag, full tag file
The results:
trivial full
original 1.5sec 4:15
alternate 1.4sec 15sec
The trivial files show what time is required to read the data, write it,
and handle all other overhead (except for tag file loading).
My alternate implementation[*] shows that most of the time was spent
evaluating the tags. This problem has been addressed.
[*] The alternate version is as general as reasonable but still specific
to my needs.
I will be happy to supply my changes if someone will tell me where to
send them.
Craig