[Chicago-talk] Question about removing '’'

Doug Bell madcityzen at gmail.com
Fri Sep 28 07:53:22 PDT 2012


On Sep 28, 2012, at 9:41 AM, Jay Strauss <me at heyjay.com> wrote:

> Hi,
> 
> I'm scraping a web page (code below) using HTML::TreeBuilder.  I'm trying to get the info between the <td> </td>, but embedded in some of the values is a ’  like:
> 
> <td align="left" nowrap>Today’s Volume</td>
> 
> What I want to do is remove the "’" or convert to a single quote, within the HTML::TreeBuilder object, figuring that's probably a more reliable approach.  

That &foo; construct is an "HTML Entity", which the HTML::Entities module can decode for you, like:

use HTML::Entities qw( decode_entities );
print decode_entities( 'That’s all folks!' );

That entity is specifically a right-angled single quote, so if that exact character is not what you want, then you could use your regular expression to change it to a straight single quote (the ' character).

Doug Bell
madcityzen at gmail.com





More information about the Chicago-talk mailing list