[Pdx-pm] Multisearch engines

Wed May 2 16:27:27 PDT 2012

I've had experience using solr (http://lucene.apache.org/solr/), but have
been attracted to elasticsearch (http://www.elasticsearch.org/).

Also, for something like the above (including Lucene), you'll probably want
a slightly different architecture then you describe below.  Instead of
having "subsearch" handlers, you'll have the individual islands (dbs, etc)
add their "documents" to the search engine, and then it will search across
them.

Something like solr or elastic search can be scaled across multiple
machines.  Solr also has a nice web-administration frontend, so you know
what each head is doing and what it's replication status is.

Perl has Plucene if you're so inclined too, but I don't have much
experience with it.

--Ben

On Wed, May 2, 2012 at 3:13 PM, Joshua Keroes <joshua at keroes.com> wrote:

> In my company we have databases, webservers, web services, and search
> engines; dozens of each. Almost every resource is an island unto itself.
> Few are linked. Some are redundant. We have so many resources that I have
> to show people our particular resources every few days.
>
> A single search engine to rule them all would be a big win.
>
> This problem has been solved before. It's called federated search, multi
> search, meta search, and search aggregation. There's a nice picture at
> http://en.wikipedia.org/wiki/Metasearch_engine .
>
> A rough overview of the project could look like this:
>
> Web frontend:
>
>    1. optionally run auto-completion while user is typing into the form
>    (that's a whole different topic)
>    2. send complete query to Metasearch API frontend
>
>
> Metasearch API frontend:
>
>    1. validate query
>    2. normalize query (improve queries if possible)
>    3. send normalized query to every subsearch handler
>    4. optionally inform frontend about all subsearches (to initialize
>    progress bars; etc.)
>    5. normalize response (add useful info to response or delete things
>    the user shouldn't see)
>    6. return subsearch response
>
>
> Subsearch handlers:
>
>    1. optionally validate and normalize query (things specific to just
>    this resource)
>    2. search. Depending on the type of resource this can mean many
>    things: search a database, check an index, fetch a web service, make a
>    webpage query and scrub the results; etc.
>    3. normalize response
>    4. return response
>
>
> Backend:
>
>    1. Run indexers
>
>
> Anyone familiar with projects in Perl-land (or outside the bubble) for
> solving this? Failing that, know of any related projects I should check out
> and/or leverage like Lucy/Lucene?
>
> Thanks,
> Joshua
>
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20120502/bb02544a/attachment.html>