[Pdx-pm] Multisearch engines
joshua at keroes.com
Wed May 2 15:13:52 PDT 2012
In my company we have databases, webservers, web services, and search
engines; dozens of each. Almost every resource is an island unto itself.
Few are linked. Some are redundant. We have so many resources that I have
to show people our particular resources every few days.
A single search engine to rule them all would be a big win.
This problem has been solved before. It's called federated search, multi
search, meta search, and search aggregation. There's a nice picture at
A rough overview of the project could look like this:
1. optionally run auto-completion while user is typing into the form
(that's a whole different topic)
2. send complete query to Metasearch API frontend
Metasearch API frontend:
1. validate query
2. normalize query (improve queries if possible)
3. send normalized query to every subsearch handler
4. optionally inform frontend about all subsearches (to initialize
progress bars; etc.)
5. normalize response (add useful info to response or delete things the
user shouldn't see)
6. return subsearch response
1. optionally validate and normalize query (things specific to just this
2. search. Depending on the type of resource this can mean many things:
search a database, check an index, fetch a web service, make a webpage
query and scrub the results; etc.
3. normalize response
4. return response
1. Run indexers
Anyone familiar with projects in Perl-land (or outside the bubble) for
solving this? Failing that, know of any related projects I should check out
and/or leverage like Lucy/Lucene?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pdx-pm-list