[mplspm]: In a sorted list [better]

Jim Anderson jim at acadcam.com
Fri Mar 22 11:12:38 CST 2002


On Fri, Mar 22, 2002 at 10:59:12AM -0600, John J. Trammell wrote:
> On Fri, Mar 22, 2002 at 09:40:54AM -0600, Jim Anderson wrote:
> > And with 2 million strings, building a hash is a non-trivial amount of
> > time.

A couple other points.  The original premise had the list sorted to start
with.  Has it always been sorted?  Building the hash may be faster than
doing a sort if it wasn't sorted to start with.  Where did the list come
from?  If it came from inside a database, having an index on the field
could eliminate the problem entirely.  If the data is in a flat file on
the disk, and searches are relatively infrequent, it may be best to do
a binary search on the file itself, and not even reading the list into
memory.  There used to be a Unix utility that did exactly that (It might
have been called 'look').  If a great deal of searches are being done,
it might work to create the hash, especially if the very long startup
time to build the hash can be tolerated.  Another possiblity, if the
list is relatively stable, is to use a tie'd hash to a DBM file.  That
way, you can build the hash once, and then use an index into the hash to
only read the bit of the file that has the string you're looking for.
This has the advantage of very fast searches, combined with little or no
setup time for the user program.  There is a very slow initial startup
time to build the hash in the first place, though.

-- 
Jim Anderson			(612) 782-0456     jim at acadcam.com
Anderson CAD/CAM, Inc		Lucifer designed MS-DOS to try
3800 Apache Lane NE		    men's souls.
St Anthony, MN  55421		Then he had a better idea...


--------------------------------------------------
Minneapolis Perl Mongers mailing list

To unsubscribe, send mail to majordomo at pm.org
with "unsubscribe mpls" in the body of the message.



More information about the Mpls-pm mailing list