[tpm] Split a string in half

Sergio Salvi sergio at salvi.ca
Sun Mar 15 10:10:03 PDT 2009


On Sat, Mar 14, 2009 at 11:21 PM, Madison Kelly <linux at alteeve.com> wrote:
> Hi all,
>
>  I need to split a sting in half, and I can assume the the middle character
> in the string will be a space ' ' character. I cannot predict how many
> spaces will be in the string though.
>
>  In short; How could I split:
>
> 'foo bar foo bar' => 'foo bar', 'foo bar'
> 'baz baz'         => 'baz', baz'
> 'foo bar baz foo bar baz' => 'foo bar baz', 'foo bar baz'
>
>  In Detail;
>
>  I use WWW::Mechanize, and I am running into a problem with a website that
> uses an image and text in a single anchor. Specifically, the alt text is the
> same as the text, so the 'find_all_links' function returns:
>
> foo bar foo bar
>

Hi Madi,

You must be getting this string from the $link_obj->text method, isn't
it? So I believe you want to extract the text of the link and not the
URL (which you can easily get with $link_obj->url).

WWW::Mechanize uses HTML::TokeParser to extract the text and by
default HTML::TokeParser *will* include the contents of "IMG ALT" and
"APPLET ALT". You'll have to pass "textify => {}" to
HTML::TokeParser->new(), but the only way to do this is either send a
patch to the maintainer to allow options to be passed to
HTML::TokeParser or to subclass WWW::Mechanize, which is not really
clean to extend because of the private %link_tags variable:

#####
package WWW::Mechanize::NoAltText;

use strict;
use warnings;
use parent "WWW::Mechanize";

my %link_tags = (
    a      => 'href',
    area   => 'href',
    frame  => 'src',
    iframe => 'src',
    link   => 'href',
    meta   => 'content',
);

sub _extract_links {
    my $self = shift;


    $self->{links} = [];
    if ( defined $self->{content} ) {
        my $parser = HTML::TokeParser->new( doc => \$self->{content},
textify => {} );
        while ( my $token = $parser->get_tag( keys %link_tags ) ) {
            my $link = $self->_link_from_token( $token, $parser );
            push( @{$self->{links}}, $link ) if $link;
        } # while
    }

    $self->{_extracted_links} = 1;

    return;
}

1;
#####

Care to send a patch with tests to Andy Lester?

Regards,
Sergio Salvi


>  For:
>
> <a href="..."><img src="..." alt="foo bar"> foo bar</a>
>
>  I never know what the link will be, only that the alt and text will be that
> same.
>
> Thanks!
>
> Madi
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
>


More information about the toronto-pm mailing list