I've had experience using solr (<a href="http://lucene.apache.org/solr/">http://lucene.apache.org/solr/</a>), but have been attracted to elasticsearch (<a href="http://www.elasticsearch.org/">http://www.elasticsearch.org/</a>).<br>


<br>Also, for something like the above (including Lucene), you'll probably want a slightly different architecture then you describe below.  Instead of having "subsearch" handlers, you'll have the individual islands (dbs, etc) add their "documents" to the search engine, and then it will search across them.<br>


<br>Something like solr or elastic search can be scaled across multiple machines.  Solr also has a nice web-administration frontend, so you know what each head is doing and what it's replication status is.<br><br>Perl has Plucene if you're so inclined too, but I don't have much experience with it.<br>


<br><br clear="all">--Ben<br>

<br><br><div class="gmail_quote">On Wed, May 2, 2012 at 3:13 PM, Joshua Keroes <span dir="ltr"><<a href="mailto:joshua@keroes.com" target="_blank">joshua@keroes.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>In my company we have databases, webservers, web services, and search engines; dozens of each. Almost every resource is an island unto itself. Few are linked. Some are redundant. We have so many resources that I have to show people our particular resources every few days.</div>


<div><br></div><div>A single search engine to rule them all would be a big win.</div><div><br></div><div>This problem has been solved before. It's called federated search, multi search, meta search, and search aggregation. There's a nice picture at <a href="http://en.wikipedia.org/wiki/Metasearch_engine" target="_blank">http://en.wikipedia.org/wiki/Metasearch_engine</a> .</div>


<div><br></div><div>A rough overview of the project could look like this:</div><div><div><br></div><div>Web frontend:</div></div><div><ol><li>optionally run auto-completion while user is typing into the form (that's a whole different topic)</li>


<li>send complete query to Metasearch API frontend</li></ol></div><div><br></div><div>Metasearch API frontend:</div><div><ol><li>validate query</li><li>normalize query (improve queries if possible)</li><li>send normalized query to every subsearch handler</li>


<li>optionally inform frontend about all subsearches (to initialize progress bars; etc.)</li><li>normalize response (add useful info to response or delete things the user shouldn't see) </li><li>return subsearch response</li>


</ol></div><div><br></div><div>Subsearch handlers:</div><div><ol><li>optionally validate and normalize query (things specific to just this resource)</li><li>search. Depending on the type of resource this can mean many things: search a database, check an index, fetch a web service, make a webpage query and scrub the results; etc.</li>


<li>normalize response</li><li>return response</li></ol><div><br></div></div><div>Backend:</div><div><ol><li>Run indexers</li></ol><div><br></div></div><div>Anyone familiar with projects in Perl-land (or outside the bubble) for solving this? Failing that, know of any related projects I should check out and/or leverage like Lucy/Lucene?</div>


<div><br></div><div>Thanks,<br>Joshua</div>

<br>_______________________________________________<br>

Pdx-pm-list mailing list<br>

<a href="mailto:Pdx-pm-list@pm.org">Pdx-pm-list@pm.org</a><br>

<a href="http://mail.pm.org/mailman/listinfo/pdx-pm-list" target="_blank">http://mail.pm.org/mailman/listinfo/pdx-pm-list</a><br></blockquote></div><br>