[Chicago-talk] Question about removing '’'
me at heyjay.com
Fri Sep 28 08:33:20 PDT 2012
Thanks Doug. I'm not sure how that's different than what I'm doing?
In that I want to actually change the contents within the HTML::TreeObject,
and not just decode (or regex) the output of $cell->as_HTML.
Maybe I missed something
On Fri, Sep 28, 2012 at 9:53 AM, Doug Bell <madcityzen at gmail.com> wrote:
> On Sep 28, 2012, at 9:41 AM, Jay Strauss <me at heyjay.com> wrote:
> > Hi,
> > I'm scraping a web page (code below) using HTML::TreeBuilder. I'm
> trying to get the info between the <td> </td>, but embedded in some of the
> values is a ’ like:
> > <td align="left" nowrap>Today’s Volume</td>
> > What I want to do is remove the "’" or convert to a single quote,
> within the HTML::TreeBuilder object, figuring that's probably a more
> reliable approach.
> That &foo; construct is an "HTML Entity", which the HTML::Entities module
> can decode for you, like:
> use HTML::Entities qw( decode_entities );
> print decode_entities( 'That’s all folks!' );
> That entity is specifically a right-angled single quote, so if that exact
> character is not what you want, then you could use your regular expression
> to change it to a straight single quote (the ' character).
> Doug Bell
> madcityzen at gmail.com
> Chicago-talk mailing list
> Chicago-talk at pm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Chicago-talk