SPUG: Day meeting in Bellevue
John Costello
cos at indeterminate.net
Wed Dec 7 15:39:16 PST 2005
On Wed, 7 Dec 2005, DeRykus, Charles E wrote:
> On Wed, Dec 07, 2005 at 12:50:19PM -0800, John Costello wrote:
> > Duane,
> > In advance of the meeting: Could someone point me to an app
> > (preferable) or C library (less preferable but oh well) that decodes
> > MS Word docs? John
> > -----
> > John Costello - cos at indeterminate dot net
>
> >>As someone who's been recently forced to convert a large manuscript into Word
> >>(my upcoming Perl book), I suddenly find myself in need of a grep-like utility
> >>for word docs.
>
> >>I'd naturally prefer an Open Source, Perlish solution, but I'd consider other
> >>options that do the job well. Apart from using regexes to match and extract plain
> >>text, I'd like to match text by /attributes/ such as style and font in
> >>addition to character patterns (JGsoft's $149 "powergrep" sounds like "strings
> >>file.doc | grep 'pattern'", which isn't quite good enough.)
>
> >>I know Word has a built-in "find" utility with its own (lame) regex dialect, but
> >>I need to automate my searches, not babysit them with mouse in hand.
>
> May not help but the Open Source 'antiword' does a better job than 'strings' at
> yanking text out of Word while preserving formatting. Feeding the stream into Perl
> should be a win in many cases...
Funny you mention that. I've been looking at antiword's code today, to
see what it can divulge from a word doc, but didn't pay attention to its
output methods. I assumed it just imported documents. Silly of me to
make that assumption, but I'm blaming my nascent head cold.
> --
> Charles DeRykus
John
-----
John Costello - cos at indeterminate dot net
"You cannot propel yourself forward by patting yourself on the back."--Unknown
More information about the spug-list
mailing list