[tpm] Split a string in half

Sergio Salvi sergio at salvi.ca
Sun Mar 15 10:10:03 PDT 2009

On Sat, Mar 14, 2009 at 11:21 PM, Madison Kelly <linux at alteeve.com> wrote:
> Hi all,
>  I need to split a sting in half, and I can assume the the middle character
> in the string will be a space ' ' character. I cannot predict how many
> spaces will be in the string though.
>  In short; How could I split:
> 'foo bar foo bar' => 'foo bar', 'foo bar'
> 'baz baz'         => 'baz', baz'
> 'foo bar baz foo bar baz' => 'foo bar baz', 'foo bar baz'
>  In Detail;
>  I use WWW::Mechanize, and I am running into a problem with a website that
> uses an image and text in a single anchor. Specifically, the alt text is the
> same as the text, so the 'find_all_links' function returns:
> foo bar foo bar

Hi Madi,

You must be getting this string from the $link_obj->text method, isn't
it? So I believe you want to extract the text of the link and not the
URL (which you can easily get with $link_obj->url).

WWW::Mechanize uses HTML::TokeParser to extract the text and by
default HTML::TokeParser *will* include the contents of "IMG ALT" and
"APPLET ALT". You'll have to pass "textify => {}" to
HTML::TokeParser->new(), but the only way to do this is either send a
patch to the maintainer to allow options to be passed to
HTML::TokeParser or to subclass WWW::Mechanize, which is not really
clean to extend because of the private %link_tags variable:

package WWW::Mechanize::NoAltText;

use strict;
use warnings;
use parent "WWW::Mechanize";

my %link_tags = (
    a      => 'href',
    area   => 'href',
    frame  => 'src',
    iframe => 'src',
    link   => 'href',
    meta   => 'content',

sub _extract_links {
    my $self = shift;

    $self->{links} = [];
    if ( defined $self->{content} ) {
        my $parser = HTML::TokeParser->new( doc => \$self->{content},
textify => {} );
        while ( my $token = $parser->get_tag( keys %link_tags ) ) {
            my $link = $self->_link_from_token( $token, $parser );
            push( @{$self->{links}}, $link ) if $link;
        } # while

    $self->{_extracted_links} = 1;



Care to send a patch with tests to Andy Lester?

Sergio Salvi

>  For:
> <a href="..."><img src="..." alt="foo bar"> foo bar</a>
>  I never know what the link will be, only that the alt and text will be that
> same.
> Thanks!
> Madi
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm

More information about the toronto-pm mailing list