[Pdx-pm] Multisearch engines
Zachary Zebrowski
zak.zebrowski at gmail.com
Wed May 2 16:32:24 PDT 2012
Lucy comes from kinosearch, which came from plucence, which came from
lucene. Look at lucy. :)
If your doing a meta search engine, look into www-mechanize, if the other
tools apis are stable and roll your own.
$0.02
Zak
On May 2, 2012 7:27 PM, "Ben Prew" <ben.prew at gmail.com> wrote:
> I've had experience using solr (http://lucene.apache.org/solr/), but have
> been attracted to elasticsearch (http://www.elasticsearch.org/).
>
> Also, for something like the above (including Lucene), you'll probably
> want a slightly different architecture then you describe below. Instead of
> having "subsearch" handlers, you'll have the individual islands (dbs, etc)
> add their "documents" to the search engine, and then it will search across
> them.
>
> Something like solr or elastic search can be scaled across multiple
> machines. Solr also has a nice web-administration frontend, so you know
> what each head is doing and what it's replication status is.
>
> Perl has Plucene if you're so inclined too, but I don't have much
> experience with it.
>
>
> --Ben
>
>
> On Wed, May 2, 2012 at 3:13 PM, Joshua Keroes <joshua at keroes.com> wrote:
>
>> In my company we have databases, webservers, web services, and search
>> engines; dozens of each. Almost every resource is an island unto itself.
>> Few are linked. Some are redundant. We have so many resources that I have
>> to show people our particular resources every few days.
>>
>> A single search engine to rule them all would be a big win.
>>
>> This problem has been solved before. It's called federated search, multi
>> search, meta search, and search aggregation. There's a nice picture at
>> http://en.wikipedia.org/wiki/Metasearch_engine .
>>
>> A rough overview of the project could look like this:
>>
>> Web frontend:
>>
>> 1. optionally run auto-completion while user is typing into the form
>> (that's a whole different topic)
>> 2. send complete query to Metasearch API frontend
>>
>>
>> Metasearch API frontend:
>>
>> 1. validate query
>> 2. normalize query (improve queries if possible)
>> 3. send normalized query to every subsearch handler
>> 4. optionally inform frontend about all subsearches (to initialize
>> progress bars; etc.)
>> 5. normalize response (add useful info to response or delete things
>> the user shouldn't see)
>> 6. return subsearch response
>>
>>
>> Subsearch handlers:
>>
>> 1. optionally validate and normalize query (things specific to just
>> this resource)
>> 2. search. Depending on the type of resource this can mean many
>> things: search a database, check an index, fetch a web service, make a
>> webpage query and scrub the results; etc.
>> 3. normalize response
>> 4. return response
>>
>>
>> Backend:
>>
>> 1. Run indexers
>>
>>
>> Anyone familiar with projects in Perl-land (or outside the bubble) for
>> solving this? Failing that, know of any related projects I should check out
>> and/or leverage like Lucy/Lucene?
>>
>> Thanks,
>> Joshua
>>
>> _______________________________________________
>> Pdx-pm-list mailing list
>> Pdx-pm-list at pm.org
>> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>>
>
>
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20120502/dda06bba/attachment-0001.html>
More information about the Pdx-pm-list
mailing list