SPUG: MS-Word grepper using "antiword"--attached!
Tim Maher
tim at consultix-inc.com
Wed Dec 7 16:12:30 PST 2005
On the basis of Charles' tip, I've dashed off the script
included below--FYI--which successfully allows me to extract
section headings, table titles, figure titles, and code-listing
titles from Word docs--without any mousing around! 8-}
Thanks again, Charles!
-Tim
==============================================================
| Tim Maher, Ph.D. tim(AT)TeachMePerl.com |
| SPUG Leader Emeritus spug(AT)TeachMePerl.com |
| Seattle Perl Users Group http://www.SeattlePerl.org |
| SPUG Wiki Site http://Mediawiki.seattleperl.org |
==============================================================
#! /bin/sh
# Wed Dec 7 15:59:30 PST 2005
# Tim Maher, Tim at TeachMePerl.com
#
# wordgrep_listings,
# wordgrep_figures,
# wordgrep_tables,
# wordgrep_headings
# (All names linked to this single file.)
#################################################
## Script to extract headings, code-listing, figure,
## table-titles from Word document using "antiword".
#################################################
case "$0" in
*listings) # Extract titles for code listings
RE='^Listing [1-9]\.[0-9] *[A-Z]'
;;
*tables) # Extract titles for tables
RE='^Table [1-9]\.[0-9] *[A-Z]'
;;
*figures) # Extract titles for figures
RE='^Figure [1-9]\.[0-9] *[A-Z]'
;;
*headings) # Extract numbered headings
RE='^[0-9](\.[0-9])+ *[A-Z][a-z]'
;;
*)
echo "$0 is an unknown invocation name" >&2; exit 3
;;
esac
multifiles=""
test "$#" -gt 1 && multifiles=yes
for doc in "$@"
do
# Set antiword options in AW_OPTS env-var
antiword $AW_OPTS "$doc" |
egrep "$RE" > $doc.o
if
test -s "$doc.o" -a -n "$multifiles"
then
# Prepend filename:
perl -wpli.bak -e "s/^/$doc:/" $doc.o
fi
# Show matching lines, if any
test -s "$doc.o" &&
cat "$doc.o" &&
rm "$doc.o"
done
More information about the spug-list
mailing list