[Pdx-pm] Multisearch engines

Joshua Keroes joshua at keroes.com
Wed May 2 15:13:52 PDT 2012


In my company we have databases, webservers, web services, and search
engines; dozens of each. Almost every resource is an island unto itself.
Few are linked. Some are redundant. We have so many resources that I have
to show people our particular resources every few days.

A single search engine to rule them all would be a big win.

This problem has been solved before. It's called federated search, multi
search, meta search, and search aggregation. There's a nice picture at
http://en.wikipedia.org/wiki/Metasearch_engine .

A rough overview of the project could look like this:

Web frontend:

   1. optionally run auto-completion while user is typing into the form
   (that's a whole different topic)
   2. send complete query to Metasearch API frontend


Metasearch API frontend:

   1. validate query
   2. normalize query (improve queries if possible)
   3. send normalized query to every subsearch handler
   4. optionally inform frontend about all subsearches (to initialize
   progress bars; etc.)
   5. normalize response (add useful info to response or delete things the
   user shouldn't see)
   6. return subsearch response


Subsearch handlers:

   1. optionally validate and normalize query (things specific to just this
   resource)
   2. search. Depending on the type of resource this can mean many things:
   search a database, check an index, fetch a web service, make a webpage
   query and scrub the results; etc.
   3. normalize response
   4. return response


Backend:

   1. Run indexers


Anyone familiar with projects in Perl-land (or outside the bubble) for
solving this? Failing that, know of any related projects I should check out
and/or leverage like Lucy/Lucene?

Thanks,
Joshua
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20120502/74c6ea3a/attachment.html>


More information about the Pdx-pm-list mailing list