jon at pielaet.net
Sat Apr 30 19:22:31 PDT 2011
Well, I am responsible for all of the alternatively formatted
materials at Disability Services for Students at the University of
For the most part, this means producing DAISY Digital Talking Books
(DTBook XML is at the core of that standard) but we also emboss some
Braille from time to time.
Right now, we convert paper books to UTF-8 text using production
scanners and OCR software then, student employees mark-up that text in
Microsoft Word and export it to DTBook XML using an Open Source
Plug-in. I would love to get rid of Word but teaching students the
inner working of this XML is not really realistic for the most part.
I am basically the only one who ever looks at the XML, and there are
common fixes that I would like to automate.
Perl seems like a good choice for this task. I am just starting to
learn the basics of Perl, but I have been using Unix\Linux for years.
More recently, I begun studying extended RegEx. In fact, that is what
has attracted me to Perl. I already have some useful expressions
written for egrep, sed, and awk, so scripting in Perl seems like the
next logical step.
I am working my way through "The Llama" (Learning Perl - O'Reilly) right now.
If anyone has any tips for working with XML in Perl, I would love to hear them.
I like to talk about what I do, so let me know if you have questions.
More information about the Westernmontana-pm