[Boulder.pm] Trickey (for a newbie) String Replacement

Peter Hutnick pm at hutnick.com
Thu May 6 08:30:00 CDT 2004


Luke Palmer wrote:

> You had me staring at this for five minutes wondering why it didn't
> work.

I almost said "$text =~ $_; #does nothing."  Guess I should have :-(

> I completely missed the fact that you didn't have an eval.

Ah, eval.  That's exactly the incantation I was looking for!

> Here's what you want (note that I fixed your regexes a little, too):
> 
>     @rule = (
>         's[\\\\emph{(.*?)}]    [<em>$1</em>]g',
>         's[\\\\chapter{(.*?)}] [<h1>$1</h1>]g',
>         );
> 
>     # this only works if the text is in $_
>     # you might have to do a little BS to get it there
>     foreach my $rule (@rule) {
>         eval $rule;
>     }

One of my philosophical clashes with perl is that I like explicit, 
descriptive variable names and expressions.  Perl allows (encourages) 
stuff like that, where you have an expression who's results are all 
implicit.

My answer is to just write things more explicitly than I have to.  I'm 
okay with that ;-)

Oh, and what is with the extra backslash?  (I.e. \\emph -> \\\\emph)

> But I'm sure someone's written a LaTeX to XHTML converter before.  Do a
> little searching around (or mabye you have).

Eh.  There's one called ltoh that would work, except that it has an 
unacceptable license term.  (It sticks a little ad for itself in the 
output, and the license disallows deleting it.)  This script will be 
used in support of a copyleft project so I don't really have any 
latitude in the matter.

The other thing is that I will be converting specific documents with 
this script.  Because of functional differences between HTML and LaTeX a 
universal translator is really impossible.  I figure one that works 
really well, but only for me, is the best solution.

> I'd also suggest using Parse::RecDescent if you want this to scale, or
> if you plan on using this on multiple documents in the future.  LaTeX
> has a heirarchial structure, and Perl 5's regexes don't work very well
> with that.

I don't think that is necessary.  I am counting on the LaTeX file being 
well formed.  Since I am only working with my own document I think that 
is okay.

Again, if this was a general purpose app parsing the whole thing 
semantically would be the way to go.  For pure simplicity you can't beat 
"``=<q>" and "''=</q>" etc.

As an update, I went through last night and made a bunch of rules and it 
worked pretty well.  I actually got everything working except internal 
links, the <span>s I need to do some font work (i.e. \Large), and I am 
getting a few extra paragraphs.  I think that the paragraphs are 
unavoidable due to the way LaTeX uses whitespace.  (Well, unavoidable 
without parsing the input semantically . . .)

Thanks a million for the advice.

-Peter



More information about the Boulder-pm mailing list