Racluster Scalability Request

Nick Diel nick at engineerity.com
Fri Aug 7 14:12:08 EDT 2009


I realize that racluster is designed to work on a large number of different
files so this may be a very long term request (or completely not feasible at
all), but I would like to see racluster run time improved.  Right now it
appears racluster has a N^2 order of operation.  For larger data sets I find
it much faster to use rasort (which appears to have a order of operation
close to N) and then use perl or some other text processing to get the
results I want.

To give a better picture I have attached a pdf to performance metrics I
collected.

I was clustering on saddr and have the processing time for different number
of records (I have included both the total number of records processed and
the total number of unique records).

Thanks,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090807/cb414739/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Racluster Times.pdf
Type: application/pdf
Size: 117770 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090807/cb414739/attachment.pdf>


More information about the argus mailing list