From chung at scripps.edu Fri May 9 12:45:47 2003 From: chung at scripps.edu (John Chung) Date: Thu Aug 5 00:20:46 2004 Subject: a regexp question Message-ID: <200305091745.KAA03725@lentil.scripps.edu> ~sdpm~ Howdy I want to do a simple substitution in html files where I can append some string to URL's (inside anchor tag). So the entire code is simply something like: ---------------------------------------------------------- #!/usr/bin/perl while (<>) { s/href="([^"])+"/appendit($1)/eg; print $_; } sub appendit { my $url = shift; $url .= "?sid=blahblah"; return $url; } ---------------------------------------------------------- I noticed that the $1, instead of it being the entire URL inside the anchor tag (between ), is usually just the last letter of that URL. I'm confused. Could someone help me so that I can just take the whole URL inside the anchor tag and pass it or refer to it? Many thanks, John Chung The Scripps Research Institute ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From cabney at ucsd.edu Fri May 9 14:43:03 2003 From: cabney at ucsd.edu (C. Abney) Date: Thu Aug 5 00:20:46 2004 Subject: a regexp question In-Reply-To: <200305091745.KAA03725@lentil.scripps.edu> References: <200305091745.KAA03725@lentil.scripps.edu> Message-ID: <1052509383.27235.64.camel@vespa> ~sdpm~ On Fri, 2003-05-09 at 10:45, John Chung wrote: > I want to do a simple substitution in html files where I > can append some string to URL's (inside anchor tag). Your code does more than simple substitution, it deletes everything not in the parens. Maybe that's what you want. > while (<>) { > s/href="([^"])+"/appendit($1)/eg; > print $_; > } maybe perldoc perlrequick to start. perldoc perlintro (header Regular Expressions) perldoc perlfaq6 perldoc perlretut The parens enclose only one character, one that is not a double-quote. The plus is looking for one or more copies of that character? You need something more like: s/(href=")(.*?)"/$1$2&sid=blah"/ Yours, Charles -- Charles Abney Polymorphism Research Laboratory, 0603 UCSD School of Medicine 9500 Gilman Dr. La Jolla, CA 92093-0603 ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From cabney at ucsd.edu Fri May 9 15:08:16 2003 From: cabney at ucsd.edu (C. Abney) Date: Thu Aug 5 00:20:46 2004 Subject: a regexp question In-Reply-To: <1052509383.27235.64.camel@vespa> References: <200305091745.KAA03725@lentil.scripps.edu> <1052509383.27235.64.camel@vespa> Message-ID: <1052510896.27235.78.camel@vespa> ~sdpm~ On Fri, 2003-05-09 at 12:43, C. Abney wrote: > s/(href=")(.*?)"/$1$2&sid=blah"/ or, better... s/(href=".*?)"/$1&sid=blah"/g -- Charles Abney Polymorphism Research Laboratory, 0603 UCSD School of Medicine 9500 Gilman Dr. La Jolla, CA 92093-0603 ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From tkil-sdpm at scrye.com Fri May 9 15:23:04 2003 From: tkil-sdpm at scrye.com (Tkil) Date: Thu Aug 5 00:20:46 2004 Subject: a regexp question In-Reply-To: <200305091745.KAA03725@lentil.scripps.edu> References: <200305091745.KAA03725@lentil.scripps.edu> Message-ID: ~sdpm~ >>>>> "John" == John Chung writes: John> s/href="([^"])+"/appendit($1)/eg; John> I noticed that the $1, instead of it being the entire URL John> inside the anchor tag (between ), is John> usually just the last letter of that URL. John> I'm confused. Could someone help me so that I can just John> take the whole URL inside the anchor tag and pass it or John> refer to it? You misplaced your parentheses; in this case, the plus quantifier modifies the grouping, not the character set. Simplest fix is: s/href="([^"]+)"/appendit($1)/eg; Although this still isn't correct, since you remove the "href" portion of the tag as well. Maybe: s/(href=")([^"]+)(")/$1 . appendit($2) . $3/eg; Comments: 1. /e is slow, and potentially insecure. Consider doing the replacement inline: s/(href=")([^"]+)(")/$1$2?sid=xxx$3/g; 2. The href url might already have a '?', so another one is incorrect (should be ";" or "&") s/(href=")([^"?]+)([^"]*)(")/$1 . $2 . ($3 ? "&" : "?" ) . "sid=xxx" . $4/eg; 3. HTML tag attributes are case-insensitive. Consider using /i: s/(href=")([^"]+)(")/$1$2?sid=xxx$3/ig; 4. "href" is also used for IMG tags. :) This gets ugly in a hurry. The slightly better answer is to parse things out in more detail; a regex that you might find helpful is discussed in: http://archive.lug.boulder.co.us/bymonth/2001.08/msg00573.html Hopefully the tips above are enough to get you started, though. If your HTML is regular enough to begin with, then just moving the + to be inside the parens should be enough. t. ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From chung at scripps.edu Fri May 9 15:33:23 2003 From: chung at scripps.edu (chung) Date: Thu Aug 5 00:20:46 2004 Subject: a regexp question Message-ID: <3EBC10F4@neo> ~sdpm~ Wow! My thanks to Dave, Steve, Charles, Tkil Time to go out the 'mastering regular expressions' book and do some practicin. You guys are all very cool. john >===== Original Message From Tkil ===== >>>>> "John" == John Chung writes: > >John> s/href="([^"])+"/appendit($1)/eg; > >John> I noticed that the $1, instead of it being the entire URL >John> inside the anchor tag (between ), is >John> usually just the last letter of that URL. > >John> I'm confused. Could someone help me so that I can just >John> take the whole URL inside the anchor tag and pass it or >John> refer to it? > >You misplaced your parentheses; in this case, the plus quantifier >modifies the grouping, not the character set. Simplest fix is: > > s/href="([^"]+)"/appendit($1)/eg; > >Although this still isn't correct, since you remove the "href" portion >of the tag as well. Maybe: > > s/(href=")([^"]+)(")/$1 . appendit($2) . $3/eg; > >Comments: > >1. /e is slow, and potentially insecure. Consider doing the > replacement inline: > > s/(href=")([^"]+)(")/$1$2?sid=xxx$3/g; > >2. The href url might already have a '?', so another one is incorrect > (should be ";" or "&") > > s/(href=")([^"?]+)([^"]*)(")/$1 . $2 . ($3 ? "&" : "?" ) . "sid=xxx" . $4/eg; > >3. HTML tag attributes are case-insensitive. Consider using /i: > > s/(href=")([^"]+)(")/$1$2?sid=xxx$3/ig; > >4. "href" is also used for IMG tags. :) > >This gets ugly in a hurry. The slightly better answer is to parse >things out in more detail; a regex that you might find helpful is >discussed in: > > http://archive.lug.boulder.co.us/bymonth/2001.08/msg00573.html > >Hopefully the tips above are enough to get you started, though. If >your HTML is regular enough to begin with, then just moving the + to >be inside the parens should be enough. > >t. ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From dgwilson1 at cox.net Sun May 11 19:25:16 2003 From: dgwilson1 at cox.net (Douglas Wilson) Date: Thu Aug 5 00:20:46 2004 Subject: Meet up w/tilly Message-ID: <001301c3181c$f6b65040$e93a0544@oc.cox.net> ~sdpm~ Greetings! This is just to announce an informal gathering I'm trying to organize to meet up with a NY monger/monk who's coming through the area, Ben Tilly (http://www.perlmonks.org/index.pl?node_id=26179), whose troubles have been the subject of a slashdot article (http://slashdot.org/article.pl?sid=02/03/21/0139244&mode=thread), has been a very helpful and knowledgeable programmer on perlmonks, and has helped develop some core perl modules, including Carp and Exporter. This is purely a social gathering, no presentations, and since I'm in OC, and I'm also posting this on the LA mailing list, I thought a good central location would be Santa Ana. Sorry, I know its a drive for you guys, but please come if you can make it. The date is tentatively Wednesday, June 18th, 7pm (or later if anyone really wants to come and wants to push back the time). Oh yeah, and the place is Santa Ana, at The Olde Ship (www.theoldeship.com). I hope some of you can make it :-) -Doug ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From rkleeman at energoncube.net Tue May 13 12:32:17 2003 From: rkleeman at energoncube.net (Bob Kleemann) Date: Thu Aug 5 00:20:46 2004 Subject: Meet up w/tilly In-Reply-To: <001301c3181c$f6b65040$e93a0544@oc.cox.net> References: <001301c3181c$f6b65040$e93a0544@oc.cox.net> Message-ID: <20030513173217.GA17656@energoncube.net> ~sdpm~ Well, Santa Ana is a bit far away (it's showing about an hour and a half for me), but hey, meeting Perl Mongers from around the world is part of what this is all about. I'm fairly certain I'll show up, just don't let us forget about it. On Sun, May 11, 2003 at 05:25:16PM -0700, Douglas Wilson wrote: > ~sdpm~ > Greetings! This is just to announce an informal gathering I'm > trying to organize to meet up with a NY monger/monk who's > coming through the area, Ben Tilly > (http://www.perlmonks.org/index.pl?node_id=26179), > whose troubles have been the subject of a slashdot article > (http://slashdot.org/article.pl?sid=02/03/21/0139244&mode=thread), > has been a very helpful and knowledgeable programmer > on perlmonks, and has helped develop some core perl > modules, including Carp and Exporter. > > This is purely a social gathering, no presentations, and since I'm > in OC, and I'm also posting this on the LA mailing list, I thought > a good central location would be Santa Ana. Sorry, I know its > a drive for you guys, but please come if you can make it. The > date is tentatively Wednesday, June 18th, 7pm (or later if anyone > really wants to come and wants to push back the time). Oh yeah, > and the place is Santa Ana, at The Olde Ship > (www.theoldeship.com). I hope some of you can make it :-) > > -Doug > > ~sdpm~ > > The posting address is: san-diego-pm-list@hfb.pm.org > > List requests should be sent to: majordomo@hfb.pm.org > > If you ever want to remove yourself from this mailing list, > you can send mail to with the following > command in the body of your email message: > > unsubscribe san-diego-pm-list > > If you ever need to get in contact with the owner of the list, > (if you have trouble unsubscribing, or have questions about the > list itself) send email to . > This is the general rule for most mailing lists when you need > to contact a human. ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From rkleeman at energoncube.net Tue May 13 12:36:32 2003 From: rkleeman at energoncube.net (Bob Kleemann) Date: Thu Aug 5 00:20:46 2004 Subject: Meeting next week. Message-ID: <20030513173632.GB17656@energoncube.net> ~sdpm~ Hey folks, Just a reminder, we have a meeting next Tuesday. I'm currently searching for locations that will be appropriate, but if I don't find anything better plan on meeting at the same location as last month, the food court at UTC. If you have any better ideas please let me know, otherwise I'll send out a reminder on Monday and I'll see you next Tuesday evening. At the meeting we can talk about finding a good place to meet, meeting up with Tilly next month, projects and/or presentations that we'd like to do, and anything else we can think of. ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From scott at storagepoint.net Tue May 13 14:07:00 2003 From: scott at storagepoint.net (Scott Zimmerman) Date: Thu Aug 5 00:20:46 2004 Subject: meeting site Message-ID: <200305131909.h4DJ9e230333@mail.pm.org> ~sdpm~ fyi, the SD Python user's group and one of the linux groups meet in nice rooms at the SD County Board of Education off Linda Vista (central within the county). Maybe they would also provide space for Perl. ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From rkleeman at energoncube.net Mon May 19 17:14:25 2003 From: rkleeman at energoncube.net (Bob Kleemann) Date: Thu Aug 5 00:20:46 2004 Subject: Meeting Tues Message-ID: <20030519221425.GB6376@energoncube.net> ~sdpm~ Perl Mongers, Just the friendly reminder that there is a meeting this Tues evening. I was not able to secure an alternate location for this meeting, so it will be at the same place it was last month, the food court at the UTC mall. Look for the us by the Rubio's location. I'll try to secure a location away from the ice rink, close to the large set of windows. Topics for this month include possible meeting locations for next month, the Open Source/Perl Conference (in Portland), and whatever other topics come to mind. If you need directions or any other assistance, let me know. Otherwise I'll see everyone there about 7PM. ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. From menolly at mib.org Mon May 19 17:50:22 2003 From: menolly at mib.org (Menolly) Date: Thu Aug 5 00:20:46 2004 Subject: Meeting Tues In-Reply-To: <20030519221425.GB6376@energoncube.net> Message-ID: ~sdpm~ Alas, this conflicts with the final episode of Buffy. But I'll be attending more starting next month. On Mon, 19 May 2003, Bob Kleemann wrote: > ~sdpm~ > Perl Mongers, > > Just the friendly reminder that there is a meeting this Tues evening. I > was not able to secure an alternate location for this meeting, so it will > be at the same place it was last month, the food court at the UTC mall. > Look for the us by the Rubio's location. I'll try to secure a location > away from the ice rink, close to the large set of windows. > > Topics for this month include possible meeting locations for next month, > the Open Source/Perl Conference (in Portland), and whatever other topics > come to mind. > > If you need directions or any other assistance, let me know. Otherwise > I'll see everyone there about 7PM. > ~sdpm~ > > The posting address is: san-diego-pm-list@hfb.pm.org > > List requests should be sent to: majordomo@hfb.pm.org > > If you ever want to remove yourself from this mailing list, > you can send mail to with the following > command in the body of your email message: > > unsubscribe san-diego-pm-list > > If you ever need to get in contact with the owner of the list, > (if you have trouble unsubscribing, or have questions about the > list itself) send email to . > This is the general rule for most mailing lists when you need > to contact a human. > -- )\._.,--....,'``. | menolly@mib.org /, _.. \ _\ (`._ ,. | http://www.livejournal.com/~nolly/ `._.-(,_..'--(,_..'`-.;.' fL| Paranoid Cynical Optimist "A corpse in the basement is just _wrong_!" "Well, yeah. That's why they got rid of it." ~sdpm~ The posting address is: san-diego-pm-list@hfb.pm.org List requests should be sent to: majordomo@hfb.pm.org If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe san-diego-pm-list If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human.