SPUG: Day meeting in Bellevue

DeRykus, Charles E charles.e.derykus at boeing.com
Wed Dec 7 14:57:57 PST 2005



On Wed, Dec 07, 2005 at 12:50:19PM -0800, John Costello wrote:
> Duane,
> In advance of the meeting:  Could someone point me to an app 
> (preferable) or C library (less preferable but oh well) that decodes 
> MS Word docs? John
> -----
> John Costello - cos at indeterminate dot net

>>As someone who's been recently forced to convert a large manuscript into Word 
>>(my upcoming Perl book), I suddenly find myself in need of a grep-like utility 
>>for word docs.

>>I'd naturally prefer an Open Source, Perlish solution, but I'd consider other 
>>options that do the job well. Apart from using regexes to match and extract plain 
>>text, I'd like to  match text by /attributes/ such as style and font in
>>addition to character patterns (JGsoft's $149 "powergrep" sounds like "strings 
>>file.doc | grep 'pattern'", which isn't quite good enough.)  

>>I know Word has a built-in "find" utility with its own (lame) regex dialect, but 
>>I need to automate my searches, not babysit them with mouse in hand.

May not help but the Open Source 'antiword' does a better job than 'strings' at 
yanking text out of Word while preserving formatting. Feeding the stream into Perl
should be a win in many cases...


--
Charles DeRykus


More information about the spug-list mailing list