[Kc] Apache Lucy 0.1.0 released

Peter Karman peter at peknet.com
Sat Jun 11 18:12:23 PDT 2011


David Nicol wrote on 6/8/11 10:59 AM:
> Just for conversation's sake, anyone familiar enough with Lucy and with SQLite
> FTSE to compare and contrast?

Good topic for conversation, David.

I've read over the SQLite full-text search docs[0] and off-the-cuff I'd say that
there are pros/cons to both approaches.

The architecture underlying both is basically the same: an inverted index of
tokenized terms.

Obviously if you want to provide search on top of an existing SQLite database,
using the built-in FTS features are very convenient. If your text is mostly
ASCII and you don't require custom tokenizing (or stemming beyond the supplied
Porter stemmer), then SQLite is probably going to serve you well for
small-to-medium projects.

If you need to scale your search application beyond a few gigs of data, or your
doc collection isn't already in a SQLite db, or you need i18n support (esp for
stemming in multiple languages), then you're probably going to need an IR
library like Lucy. First, it's a library, so you can customize your indexing and
searching code to fit your particular application. Second, it's in Perl (which
for this audience should be a win). Third, it provides very flexible tokenizing
and stemming options (Lucy ships with Snowball support). Lucy is in the same
camp as Lucene, Sphinx, Xapian, etc. It's for when you Get Serious about your
search application.


[0] http://www.sqlite.org/fts3.html


-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com


More information about the kc mailing list