SPUG: Day meeting in Bellevue
cos at indeterminate.net
Wed Dec 7 15:53:13 PST 2005
On Wed, 7 Dec 2005, Tim Maher wrote:
> On Wed, Dec 07, 2005 at 12:50:19PM -0800, John Costello wrote:
> > Duane,
> > Sorry for the late response. Wednesday the 14th in Bellevue sounds great.
> > Azteca is close to my office, but Dixie's BBQ works as well.
> > In advance of the meeting: Could someone point me to an app (preferable)
> > or C library (less preferable but oh well) that decodes MS Word docs?
> > John
> > -----
> > John Costello - cos at indeterminate dot net
> As someone who's been recently forced to convert a large
> manuscript into Word (my upcoming Perl book), I suddenly
> find myself in need of a grep-like utility for word docs.
> I'd naturally prefer an Open Source, Perlish solution, but I'd
> consider other options that do the job well. Apart from using
> regexes to match and extract plain text, I'd like to
> match text by /attributes/ such as style and font in
> addition to character patterns (JGsoft's $149 "powergrep"
> sounds like "strings file.doc | grep 'pattern'", which isn't
> quite good enough.)
That's a lot of money to pay for "strings file.doc |grep 'pattern'"!
> I know Word has a built-in "find" utility with its own (lame)
> regex dialect, but I need to automate my searches, not babysit
> them with mouse in hand.
My ideal would be to have a module for parsing word docs and another
module for writing them. Such modules exist for Excel
(Spreadsheet::WriteExcel and Spreadsheet::ReadExcel), and they are
extremely useful if you need to spew data at managers. They work well on
*NIX, too. I haven't been able to find sufficient information on Word's
formats to gauge whether there is enough information readily available to
create such modules for Word docs.
John Costello - cos at indeterminate dot net
"You cannot propel yourself forward by patting yourself on the back."--Unknown
More information about the spug-list