[tpm] find, manipulate, then output

Liam R E Quin liam at holoweb.net
Mon May 28 17:22:05 PDT 2012


On Mon, 2012-05-28 at 09:59 -0400, Antonio Sun wrote:


> Here is an example that you can work on. Given the following input,
> I want to output, "<last-name>, <first-name>" on each line.

For my part, I always want readable, maintainable code.

For your example, I'd use XQuery -

for $book in /bookstore/book
return ($book/last-name, " ", $book/first-name, "&#xa;")

You could use the BaseX Perl API to run this (as an example).

If you want to use regular expressions, here's a longer version:

sub get-name($)
{
    my ($book) = @_;

    die "get-name needs a book element"
    unless ($book ~= m{^\s*<book.*book>\s*$};

    my ($first, $last) = ("", "");

    if ($book =~ m{<first-name>\s*([^<>]*\S)\s*</first-name>}) {
        $first = $1;
    }

    if ($book =~ m{<last-name>\s*([^<>]*\S)\s*</last-name>}) {
        $last = $1;
    }

    my $result = $last;
    if ($result ne "" && $first ne "") {
        $result .= ", ";
    }
    $result .= first;
    return $result;
}

while ($blob =~ m{(<book[^<>]*>.*?</book>)}gs) {
    print get-name($1);
}

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml



More information about the toronto-pm mailing list