[HRPM] windoze word parsing
chicks at chicks.net
chicks at chicks.net
Thu Oct 26 08:52:18 CDT 2000
On Thu, 26 Oct 2000, Troy E. Webster wrote:
> I need to open up a msword 2000 dcoument and parse through it,
> stripping out all the extraneous control characters and
> non-printables. End result will be a html document with custom
> formatting. Has any one done this before? Does anyone have any ideas
> for approaching this? Any advice besides the usual rtfm?
I use mswordview which I installed as an RPM:
Name : mswordview
Version : 0.5.2
Release : 1
Group : Utilities/Text
Size : 2137284
License : GPL
Vendor : Caolan McNamara <Caolan.McNamara at ul.ie>
Packager : Ryan Weaver <ryanw at infohwy.com>
URL : http://www.csn.ul.ie/~caolan/docs/MSWordView.html
Summary : MSWord 8 binary file format -> HTML converter
Description :
MSWordView is a program that understands the Microsoft Word 8
binary file format (Office97) and is able to convert Word
documents into HTML, which can then be read with a browser.
It does OK with some documents from 2000 and complains about others.
YMMV.
> ps. nice meeting last night, I learned alot from Matt's tk talk
It was very well done. He's going to turn the slides into HTML after
making some minor corrections and we'll post it on norfolk.pm.org.
--
</chris>
"The number of Unix installations has grown to 10, with more expected."
-- The Unix Programmer's Manual, 2nd edition, June '72
More information about the Norfolk-pm
mailing list