[tpm] Split a string in half
Sergio Salvi
sergio at salvi.ca
Sun Mar 15 10:10:03 PDT 2009
On Sat, Mar 14, 2009 at 11:21 PM, Madison Kelly <linux at alteeve.com> wrote:
> Hi all,
>
> I need to split a sting in half, and I can assume the the middle character
> in the string will be a space ' ' character. I cannot predict how many
> spaces will be in the string though.
>
> In short; How could I split:
>
> 'foo bar foo bar' => 'foo bar', 'foo bar'
> 'baz baz' => 'baz', baz'
> 'foo bar baz foo bar baz' => 'foo bar baz', 'foo bar baz'
>
> In Detail;
>
> I use WWW::Mechanize, and I am running into a problem with a website that
> uses an image and text in a single anchor. Specifically, the alt text is the
> same as the text, so the 'find_all_links' function returns:
>
> foo bar foo bar
>
Hi Madi,
You must be getting this string from the $link_obj->text method, isn't
it? So I believe you want to extract the text of the link and not the
URL (which you can easily get with $link_obj->url).
WWW::Mechanize uses HTML::TokeParser to extract the text and by
default HTML::TokeParser *will* include the contents of "IMG ALT" and
"APPLET ALT". You'll have to pass "textify => {}" to
HTML::TokeParser->new(), but the only way to do this is either send a
patch to the maintainer to allow options to be passed to
HTML::TokeParser or to subclass WWW::Mechanize, which is not really
clean to extend because of the private %link_tags variable:
#####
package WWW::Mechanize::NoAltText;
use strict;
use warnings;
use parent "WWW::Mechanize";
my %link_tags = (
a => 'href',
area => 'href',
frame => 'src',
iframe => 'src',
link => 'href',
meta => 'content',
);
sub _extract_links {
my $self = shift;
$self->{links} = [];
if ( defined $self->{content} ) {
my $parser = HTML::TokeParser->new( doc => \$self->{content},
textify => {} );
while ( my $token = $parser->get_tag( keys %link_tags ) ) {
my $link = $self->_link_from_token( $token, $parser );
push( @{$self->{links}}, $link ) if $link;
} # while
}
$self->{_extracted_links} = 1;
return;
}
1;
#####
Care to send a patch with tests to Andy Lester?
Regards,
Sergio Salvi
> For:
>
> <a href="..."><img src="..." alt="foo bar"> foo bar</a>
>
> I never know what the link will be, only that the alt and text will be that
> same.
>
> Thanks!
>
> Madi
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
>
More information about the toronto-pm
mailing list