[Philadelphia-pm] Perl one liner, regex capture group problem

Chris Nehren c.nehren/phl at shadowcat.co.uk
Fri Oct 28 09:07:47 PDT 2011


On Oct 28, 2011, at 11:43, Walt Mankowski wrote:

> On Fri, Oct 28, 2011 at 11:14:56AM -0400, Kyle R. Burton wrote:
>>> command: curl -so- http://www.wikihow.com/Make-Easy-Homemade-Biscuits|perl
>>> -nE "say $1 if /src='(\S+(?:png|jpg))'/"
>>> abbreviated output:
>> 
>> Stan,
>> 
>> You may just be hitting shell replacement since the expression is in
>> double quotes - try backslashing the $1:
>> 
>> ... perl -nE "say \$1 if /src='(\S+(?:png|jpg))'/"
> 
> An alternative would be to enclose the perl in single quotes instead
> of double quotes. Then you don't have to worry about backslashing the
> $1, but you do need to backslash the single quotes:
> 
>  perl -nE 'say $1 if /src=\'(\S+(?:png|jpg))\'/'
> 
> But unfortunately if you're on Windows then you can't use single
> quotes. so you'll need to use Kyle's solution.

Rather than trying to parse HTML with regex (which is doomed to failure) your'e really better off using a proper parser, like HTML::TreeBuilder or the like. It may not be the answer you're looking for, but it has the virtue of being the right one, one that's easier to work with if you end up keeping this code.
-- 
Thanks and best regards,
Chris Nehren


More information about the Philadelphia-pm mailing list