a regexp question
tkil-sdpm at scrye.com
Fri May 9 15:23:04 CDT 2003
>>>>> "John" == John Chung <chung at scripps.edu> writes:
John> I noticed that the $1, instead of it being the entire URL
John> inside the anchor tag (between <a href=" and ">), is
John> usually just the last letter of that URL.
John> I'm confused. Could someone help me so that I can just
John> take the whole URL inside the anchor tag and pass it or
John> refer to it?
You misplaced your parentheses; in this case, the plus quantifier
modifies the grouping, not the character set. Simplest fix is:
Although this still isn't correct, since you remove the "href" portion
of the tag as well. Maybe:
s/(href=")([^"]+)(")/$1 . appendit($2) . $3/eg;
1. /e is slow, and potentially insecure. Consider doing the
2. The href url might already have a '?', so another one is incorrect
(should be ";" or "&")
s/(href=")([^"?]+)([^"]*)(")/$1 . $2 . ($3 ? "&" : "?" ) . "sid=xxx" . $4/eg;
3. HTML tag attributes are case-insensitive. Consider using /i:
4. "href" is also used for IMG tags. :)
This gets ugly in a hurry. The slightly better answer is to parse
things out in more detail; a regex that you might find helpful is
Hopefully the tips above are enough to get you started, though. If
your HTML is regular enough to begin with, then just moving the + to
be inside the parens should be enough.
The posting address is: san-diego-pm-list at hfb.pm.org
List requests should be sent to: majordomo at hfb.pm.org
If you ever want to remove yourself from this mailing list,
you can send mail to <majordomo at happyfunball.pm.org> with the following
command in the body of your email message:
If you ever need to get in contact with the owner of the list,
(if you have trouble unsubscribing, or have questions about the
list itself) send email to <owner-san-diego-pm-list at happyfunball.pm.org> .
This is the general rule for most mailing lists when you need
to contact a human.
More information about the San-Diego-pm