SPUG: Day meeting in Bellevue

John Costello cos at indeterminate.net
Wed Dec 7 15:53:13 PST 2005

On Wed, 7 Dec 2005, Tim Maher wrote:
> On Wed, Dec 07, 2005 at 12:50:19PM -0800, John Costello wrote:
> > Duane,
> > 
> > Sorry for the late response.  Wednesday the 14th in Bellevue sounds great.  
> > Azteca is close to my office, but Dixie's BBQ works as well.
> > 
> > In advance of the meeting:  Could someone point me to an app (preferable)
> > or C library (less preferable but oh well) that decodes MS Word docs? 
> > John
> > -----
> > John Costello - cos at indeterminate dot net
> As someone who's been recently forced to convert a large
> manuscript into Word (my upcoming Perl book), I suddenly
> find myself in need of a grep-like utility for word docs.
> I'd naturally prefer an Open Source, Perlish solution, but I'd
> consider other options that do the job well. Apart from using
> regexes to match and extract plain text, I'd like to 
> match text by /attributes/ such as style and font in
> addition to character patterns (JGsoft's $149 "powergrep"
> sounds like "strings file.doc | grep 'pattern'", which isn't
> quite good enough.)  

That's a lot of money to pay for "strings file.doc |grep 'pattern'"!

> I know Word has a built-in "find" utility with its own (lame)
> regex dialect, but I need to automate my searches, not babysit
> them with mouse in hand.

My ideal would be to have a module for parsing word docs and another 
module for writing them.  Such modules exist for Excel 
(Spreadsheet::WriteExcel and Spreadsheet::ReadExcel), and they are 
extremely useful if you need to spew data at managers.  They work well on 
*NIX, too.  I haven't been able to find sufficient information on Word's 
formats to gauge whether there is enough information readily available to 
create such modules for Word docs.

 > -Tim

John Costello - cos at indeterminate dot net
"You cannot propel yourself forward by patting yourself on the back."--Unknown

More information about the spug-list mailing list