SPUG: MS-Word grepper using "antiword"--attached!
William Julien
moonbeam at catmanor.com
Wed Dec 7 23:11:16 PST 2005
I use antiword and smbclient in the perl cgi web script that queries
an windows directory and allows the user to pick a word doc from a
list, and render that doc in plain text. Works great!
William
vi bill
On Dec 7, 2005, at 4:12 PM, Tim Maher wrote:
> On the basis of Charles' tip, I've dashed off the script
> included below--FYI--which successfully allows me to extract
> section headings, table titles, figure titles, and code-listing
> titles from Word docs--without any mousing around! 8-}
>
> Thanks again, Charles!
>
> -Tim
> ==============================================================
> | Tim Maher, Ph.D. tim(AT)TeachMePerl.com |
> | SPUG Leader Emeritus spug(AT)TeachMePerl.com |
> | Seattle Perl Users Group http://www.SeattlePerl.org |
> | SPUG Wiki Site http://Mediawiki.seattleperl.org |
> ==============================================================
>
> #! /bin/sh
> # Wed Dec 7 15:59:30 PST 2005
> # Tim Maher, Tim at TeachMePerl.com
> #
> # wordgrep_listings,
> # wordgrep_figures,
> # wordgrep_tables,
> # wordgrep_headings
> # (All names linked to this single file.)
> #################################################
> ## Script to extract headings, code-listing, figure,
> ## table-titles from Word document using "antiword".
> #################################################
>
> case "$0" in
> *listings) # Extract titles for code listings
> RE='^Listing [1-9]\.[0-9] *[A-Z]'
> ;;
> *tables) # Extract titles for tables
> RE='^Table [1-9]\.[0-9] *[A-Z]'
> ;;
> *figures) # Extract titles for figures
> RE='^Figure [1-9]\.[0-9] *[A-Z]'
> ;;
> *headings) # Extract numbered headings
> RE='^[0-9](\.[0-9])+ *[A-Z][a-z]'
> ;;
> *)
> echo "$0 is an unknown invocation name" >&2; exit 3
> ;;
> esac
>
> multifiles=""
> test "$#" -gt 1 && multifiles=yes
>
> for doc in "$@"
> do
> # Set antiword options in AW_OPTS env-var
> antiword $AW_OPTS "$doc" |
> egrep "$RE" > $doc.o
> if
> test -s "$doc.o" -a -n "$multifiles"
> then
> # Prepend filename:
> perl -wpli.bak -e "s/^/$doc:/" $doc.o
> fi
> # Show matching lines, if any
> test -s "$doc.o" &&
> cat "$doc.o" &&
> rm "$doc.o"
> done
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
> POST TO: spug-list at pm.org
> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
> MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
> WEB PAGE: http://seattleperl.org/
More information about the spug-list
mailing list