SPUG: MS-Word grepper using "antiword"--attached!

Tim Maher tim at consultix-inc.com
Wed Dec 7 16:12:30 PST 2005


On the basis of Charles' tip, I've dashed off the script
included below--FYI--which successfully allows me to extract
section headings, table titles, figure titles, and code-listing
titles from Word docs--without any mousing around! 8-}

Thanks again, Charles!

-Tim
==============================================================
| Tim Maher, Ph.D.                    tim(AT)TeachMePerl.com | 
| SPUG Leader Emeritus               spug(AT)TeachMePerl.com |
| Seattle Perl Users Group        http://www.SeattlePerl.org |
| SPUG Wiki Site            http://Mediawiki.seattleperl.org |
==============================================================

#! /bin/sh
# Wed Dec  7 15:59:30 PST 2005
# Tim Maher, Tim at TeachMePerl.com
#
# wordgrep_listings,
#   wordgrep_figures,
#   wordgrep_tables,
#   wordgrep_headings
# (All names linked to this single file.)
#################################################
## Script to extract headings, code-listing, figure,
## table-titles from Word document using "antiword".
#################################################

case "$0" in
        *listings)      # Extract titles for code listings
                RE='^Listing [1-9]\.[0-9]  *[A-Z]' 
                ;;
        *tables)        # Extract titles for tables
                RE='^Table [1-9]\.[0-9]  *[A-Z]' 
                ;;
        *figures)       # Extract titles for figures
                RE='^Figure [1-9]\.[0-9]  *[A-Z]' 
                ;;
        *headings)      # Extract numbered headings
                RE='^[0-9](\.[0-9])+  *[A-Z][a-z]' 
                ;;
        *)
                echo "$0 is an unknown invocation name" >&2; exit 3
                ;;
esac

multifiles=""
test "$#" -gt 1 && multifiles=yes

for doc in "$@"
do
        # Set antiword options in AW_OPTS env-var
        antiword $AW_OPTS "$doc" |
                egrep "$RE" > $doc.o
                if
                        test -s "$doc.o" -a -n "$multifiles"
                then
                        # Prepend filename:
                        perl -wpli.bak -e "s/^/$doc:/" $doc.o
                fi
        # Show matching lines, if any
        test -s "$doc.o" &&
                cat "$doc.o" &&
                        rm "$doc.o"
done


More information about the spug-list mailing list