[Boulder.pm] Trickey (for a newbie) String Replacement
Peter Hutnick
pm at hutnick.com
Thu May 6 08:30:00 CDT 2004
Luke Palmer wrote:
> You had me staring at this for five minutes wondering why it didn't
> work.
I almost said "$text =~ $_; #does nothing." Guess I should have :-(
> I completely missed the fact that you didn't have an eval.
Ah, eval. That's exactly the incantation I was looking for!
> Here's what you want (note that I fixed your regexes a little, too):
>
> @rule = (
> 's[\\\\emph{(.*?)}] [<em>$1</em>]g',
> 's[\\\\chapter{(.*?)}] [<h1>$1</h1>]g',
> );
>
> # this only works if the text is in $_
> # you might have to do a little BS to get it there
> foreach my $rule (@rule) {
> eval $rule;
> }
One of my philosophical clashes with perl is that I like explicit,
descriptive variable names and expressions. Perl allows (encourages)
stuff like that, where you have an expression who's results are all
implicit.
My answer is to just write things more explicitly than I have to. I'm
okay with that ;-)
Oh, and what is with the extra backslash? (I.e. \\emph -> \\\\emph)
> But I'm sure someone's written a LaTeX to XHTML converter before. Do a
> little searching around (or mabye you have).
Eh. There's one called ltoh that would work, except that it has an
unacceptable license term. (It sticks a little ad for itself in the
output, and the license disallows deleting it.) This script will be
used in support of a copyleft project so I don't really have any
latitude in the matter.
The other thing is that I will be converting specific documents with
this script. Because of functional differences between HTML and LaTeX a
universal translator is really impossible. I figure one that works
really well, but only for me, is the best solution.
> I'd also suggest using Parse::RecDescent if you want this to scale, or
> if you plan on using this on multiple documents in the future. LaTeX
> has a heirarchial structure, and Perl 5's regexes don't work very well
> with that.
I don't think that is necessary. I am counting on the LaTeX file being
well formed. Since I am only working with my own document I think that
is okay.
Again, if this was a general purpose app parsing the whole thing
semantically would be the way to go. For pure simplicity you can't beat
"``=<q>" and "''=</q>" etc.
As an update, I went through last night and made a bunch of rules and it
worked pretty well. I actually got everything working except internal
links, the <span>s I need to do some font work (i.e. \Large), and I am
getting a few extra paragraphs. I think that the paragraphs are
unavoidable due to the way LaTeX uses whitespace. (Well, unavoidable
without parsing the input semantically . . .)
Thanks a million for the advice.
-Peter
More information about the Boulder-pm
mailing list