a regexp question

Tkil tkil-sdpm at scrye.com
Fri May 9 15:23:04 CDT 2003


~sdpm~
>>>>> "John" == John Chung <chung at scripps.edu> writes:

John> 	s/href="([^"])+"/appendit($1)/eg;

John> I noticed that the $1, instead of it being the entire URL
John> inside the anchor tag (between <a href=" and  ">), is
John> usually just the last letter of that URL.

John> I'm confused.  Could someone help me so that I can just
John> take the whole URL inside the anchor tag and pass it or
John> refer to it?

You misplaced your parentheses; in this case, the plus quantifier
modifies the grouping, not the character set.  Simplest fix is:

   s/href="([^"]+)"/appendit($1)/eg;

Although this still isn't correct, since you remove the "href" portion
of the tag as well.  Maybe:

   s/(href=")([^"]+)(")/$1 . appendit($2) . $3/eg;

Comments:

1. /e is slow, and potentially insecure.  Consider doing the
   replacement inline:

      s/(href=")([^"]+)(")/$1$2?sid=xxx$3/g;

2. The href url might already have a '?', so another one is incorrect
   (should be ";" or "&")

      s/(href=")([^"?]+)([^"]*)(")/$1 . $2 . ($3 ? "&" : "?" ) . "sid=xxx" . $4/eg;

3. HTML tag attributes are case-insensitive.  Consider using /i:

      s/(href=")([^"]+)(")/$1$2?sid=xxx$3/ig;

4. "href" is also used for IMG tags.  :)

This gets ugly in a hurry.  The slightly better answer is to parse
things out in more detail; a regex that you might find helpful is
discussed in:

   http://archive.lug.boulder.co.us/bymonth/2001.08/msg00573.html

Hopefully the tips above are enough to get you started, though.  If
your HTML is regular enough to begin with, then just moving the + to
be inside the parens should be enough.

t.
~sdpm~

The posting address is: san-diego-pm-list at hfb.pm.org

List requests should be sent to: majordomo at hfb.pm.org

If you ever want to remove yourself from this mailing list,
you can send mail to <majordomo at happyfunball.pm.org> with the following
command in the body of your email message:

    unsubscribe san-diego-pm-list

If you ever need to get in contact with the owner of the list,
(if you have trouble unsubscribing, or have questions about the
list itself) send email to <owner-san-diego-pm-list at happyfunball.pm.org> .
This is the general rule for most mailing lists when you need
to contact a human.




More information about the San-Diego-pm mailing list