[Pdx-pm] Multisearch engines
zak.zebrowski at gmail.com
Wed May 2 16:32:24 PDT 2012
Lucy comes from kinosearch, which came from plucence, which came from
lucene. Look at lucy. :)
If your doing a meta search engine, look into www-mechanize, if the other
tools apis are stable and roll your own.
On May 2, 2012 7:27 PM, "Ben Prew" <ben.prew at gmail.com> wrote:
> I've had experience using solr (http://lucene.apache.org/solr/), but have
> been attracted to elasticsearch (http://www.elasticsearch.org/).
> Also, for something like the above (including Lucene), you'll probably
> want a slightly different architecture then you describe below. Instead of
> having "subsearch" handlers, you'll have the individual islands (dbs, etc)
> add their "documents" to the search engine, and then it will search across
> Something like solr or elastic search can be scaled across multiple
> machines. Solr also has a nice web-administration frontend, so you know
> what each head is doing and what it's replication status is.
> Perl has Plucene if you're so inclined too, but I don't have much
> experience with it.
> On Wed, May 2, 2012 at 3:13 PM, Joshua Keroes <joshua at keroes.com> wrote:
>> In my company we have databases, webservers, web services, and search
>> engines; dozens of each. Almost every resource is an island unto itself.
>> Few are linked. Some are redundant. We have so many resources that I have
>> to show people our particular resources every few days.
>> A single search engine to rule them all would be a big win.
>> This problem has been solved before. It's called federated search, multi
>> search, meta search, and search aggregation. There's a nice picture at
>> http://en.wikipedia.org/wiki/Metasearch_engine .
>> A rough overview of the project could look like this:
>> Web frontend:
>> 1. optionally run auto-completion while user is typing into the form
>> (that's a whole different topic)
>> 2. send complete query to Metasearch API frontend
>> Metasearch API frontend:
>> 1. validate query
>> 2. normalize query (improve queries if possible)
>> 3. send normalized query to every subsearch handler
>> 4. optionally inform frontend about all subsearches (to initialize
>> progress bars; etc.)
>> 5. normalize response (add useful info to response or delete things
>> the user shouldn't see)
>> 6. return subsearch response
>> Subsearch handlers:
>> 1. optionally validate and normalize query (things specific to just
>> this resource)
>> 2. search. Depending on the type of resource this can mean many
>> things: search a database, check an index, fetch a web service, make a
>> webpage query and scrub the results; etc.
>> 3. normalize response
>> 4. return response
>> 1. Run indexers
>> Anyone familiar with projects in Perl-land (or outside the bubble) for
>> solving this? Failing that, know of any related projects I should check out
>> and/or leverage like Lucy/Lucene?
>> Pdx-pm-list mailing list
>> Pdx-pm-list at pm.org
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pdx-pm-list