SPUG: Day meeting in Bellevue
DeRykus, Charles E
charles.e.derykus at boeing.com
Wed Dec 7 14:57:57 PST 2005
On Wed, Dec 07, 2005 at 12:50:19PM -0800, John Costello wrote:
> In advance of the meeting: Could someone point me to an app
> (preferable) or C library (less preferable but oh well) that decodes
> MS Word docs? John
> John Costello - cos at indeterminate dot net
>>As someone who's been recently forced to convert a large manuscript into Word
>>(my upcoming Perl book), I suddenly find myself in need of a grep-like utility
>>for word docs.
>>I'd naturally prefer an Open Source, Perlish solution, but I'd consider other
>>options that do the job well. Apart from using regexes to match and extract plain
>>text, I'd like to match text by /attributes/ such as style and font in
>>addition to character patterns (JGsoft's $149 "powergrep" sounds like "strings
>>file.doc | grep 'pattern'", which isn't quite good enough.)
>>I know Word has a built-in "find" utility with its own (lame) regex dialect, but
>>I need to automate my searches, not babysit them with mouse in hand.
May not help but the Open Source 'antiword' does a better job than 'strings' at
yanking text out of Word while preserving formatting. Feeding the stream into Perl
should be a win in many cases...
More information about the spug-list