I figured that I'm probably not the first one to think about this problem, so I went trolling the intertubes for ready-made solutions. Since perl is my glueware language of choice, I searched until I found the following handy snippet from prlmnks.org:
use XML::LibXML;
my $parser = XML::LibXML->new();
$parser->recover(1);
my $doc = $parser->parse_file($ARGV[0]);
print $doc->toString(1);
Very, very nice! Now I am part of the way there. Next, I took a pre-existing MS Word document of similar make and model, and prepended it. With a little manual massaging, I got the script above to parse it, and even pretty-print it (a nice bonus). Unfortunately, Microsoft Word still doesn't like the resultant "document."
I'm still working on this problem, but that's decent progress for an hour of work.
No comments:
Post a Comment