SPUG: MS-Word grepper using "antiword"--attached!

William Julien moonbeam at catmanor.com
Wed Dec 7 23:11:16 PST 2005


I use antiword and smbclient in the perl cgi web script that queries  
an windows directory and allows the user to pick a word doc from a  
list, and render that doc in plain text. Works great!

William
vi bill

On Dec 7, 2005, at 4:12 PM, Tim Maher wrote:

> On the basis of Charles' tip, I've dashed off the script
> included below--FYI--which successfully allows me to extract
> section headings, table titles, figure titles, and code-listing
> titles from Word docs--without any mousing around! 8-}
>
> Thanks again, Charles!
>
> -Tim
> ==============================================================
> | Tim Maher, Ph.D.                    tim(AT)TeachMePerl.com |
> | SPUG Leader Emeritus               spug(AT)TeachMePerl.com |
> | Seattle Perl Users Group        http://www.SeattlePerl.org |
> | SPUG Wiki Site            http://Mediawiki.seattleperl.org |
> ==============================================================
>
> #! /bin/sh
> # Wed Dec  7 15:59:30 PST 2005
> # Tim Maher, Tim at TeachMePerl.com
> #
> # wordgrep_listings,
> #   wordgrep_figures,
> #   wordgrep_tables,
> #   wordgrep_headings
> # (All names linked to this single file.)
> #################################################
> ## Script to extract headings, code-listing, figure,
> ## table-titles from Word document using "antiword".
> #################################################
>
> case "$0" in
>         *listings)      # Extract titles for code listings
>                 RE='^Listing [1-9]\.[0-9]  *[A-Z]'
>                 ;;
>         *tables)        # Extract titles for tables
>                 RE='^Table [1-9]\.[0-9]  *[A-Z]'
>                 ;;
>         *figures)       # Extract titles for figures
>                 RE='^Figure [1-9]\.[0-9]  *[A-Z]'
>                 ;;
>         *headings)      # Extract numbered headings
>                 RE='^[0-9](\.[0-9])+  *[A-Z][a-z]'
>                 ;;
>         *)
>                 echo "$0 is an unknown invocation name" >&2; exit 3
>                 ;;
> esac
>
> multifiles=""
> test "$#" -gt 1 && multifiles=yes
>
> for doc in "$@"
> do
>         # Set antiword options in AW_OPTS env-var
>         antiword $AW_OPTS "$doc" |
>                 egrep "$RE" > $doc.o
>                 if
>                         test -s "$doc.o" -a -n "$multifiles"
>                 then
>                         # Prepend filename:
>                         perl -wpli.bak -e "s/^/$doc:/" $doc.o
>                 fi
>         # Show matching lines, if any
>         test -s "$doc.o" &&
>                 cat "$doc.o" &&
>                         rm "$doc.o"
> done
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
>      POST TO: spug-list at pm.org
> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list
>     MEETINGS: 3rd Tuesdays, Location: Amazon.com Pac-Med
>     WEB PAGE: http://seattleperl.org/



More information about the spug-list mailing list