Perl regexp-based Web search portal

Chris Radcliff chris at velocigen.com
Thu Jul 6 13:35:08 CDT 2000


~sdpm~

"John R. Comeau" wrote:
> Does anyone know of a Web search portal (like Yahoo or AltaVista) in
> which the search syntax is based on Perl regular expressions, or any
> regular expressions, for that matter?  On Yahoo, I always get pages of
> matches for things I don't want because I'm not able to enter the
> search the way I want.
> 

I don't know of one that's based on Perl regex, but it's a good idea. It
would take some doing, though; Perl regex doesn't generally assume
ranking millions of documents, and most Web search engines use hash
tables and such to improve search performance. (A hash table is like an
index in that it stores which pages contain a word (fnord, for example)
rather than searching through them all each time.)

While it's easy to build hash tables from documents -- just yank out all
the words from a document and create an entry for each one -- it would
be hard to do the same thing for a regex. The document "The quick brown
fox." is true for /fox/ and /brown/, but it's also true for /\w+\sf\w+/
and a zillion others.

An intermediate way of doing it is the way AltaVista works, where you
can specify queries with the hash table in mind. For instance,

+url:globalspin +perl

will find only those pages with 'globalspin' in the URL and 'perl' on
the page or in the description somewhere. Similarly,

+url:.pl -host:.pl

will find pages that have .pl in the URL (good for finding perl CGI) but
not pages with .pl in the hostname (good for weeding out sites in
Poland.)

It's not regex, but it's something.

~chris
~sdpm~

The posting address is: san-diego-pm-list at hfb.pm.org

List requests should be sent to: majordomo at hfb.pm.org

If you ever want to remove yourself from this mailing list,
you can send mail to <majordomo at happyfunball.pm.org> with the following
command in the body of your email message:

    unsubscribe san-diego-pm-list

If you ever need to get in contact with the owner of the list,
(if you have trouble unsubscribing, or have questions about the
list itself) send email to <owner-san-diego-pm-list at happyfunball.pm.org> .
This is the general rule for most mailing lists when you need
to contact a human.




More information about the San-Diego-pm mailing list