Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing MS Word files in Perl

Status
Not open for further replies.

Nebbish

Programmer
Apr 7, 2004
73
US
...is it possible? Seems like this wacky OLE module that I've never used can open Word and manipulate files within the program, but I'd prefer to keep things simple and parse some .doc files within the warm, comfortable confines of my Perl script, without opening any helper programs. CPAN isn't turning up anything. Any thoughts?

Thanks,

-Nick
 
I think you won't be able to read word files without OLE module. These files have special format..you can't handle it as a text file.

here I am sending you an example how to use OLE with Word.

#!perl
use Win32::OLE;

# check if Word exists
my $x = Win32::OLE->GetActiveObject('Word.Application');
die "Word not Installed" if $@;

# start Word program die if unable to
unless (defined $x) {
$x = Win32::OLE->new('Word.Application', sub { $_[0]->Quit; } )
or die 'Cannot start Word';
}

# Create new document
my $d = $x->Documents->Add;
# define selection
my $s = $x->Selection;
#set lines to be written to document
@line = ('This is a test line',
'This is second test Line',
'This is the third line',
);

# $c is the color
# $start is the start of Range
# $end is the end of Range
# $r is the Range object
my ($c, $start, $end, $r) = (2, 0, 0, );
foreach (@line) {
$end += length($_) + 1;
# put the text
$s->TypeText($_);
# define the Range
$r = $d->Range($start, $end);
# Set font to 12 and color
$r->Font->{Size} = 12;
$r->Font->{ColorIndex} = $c++;
$s->TypeText("\n");
$start = $end;
}

# List Range Objects
ListObj($r);
#List Document Objects
ListObj($d);

sub ListObj {
foreach (sort keys %$r) {
print "Keys: $_ - $r->{$_}\n";
}
}

undef $x;

----------------------------------------------------------

#Getting if a word document is password protected

use OLE;
$word=GetObject OLE("Word.Application");
$i=$word->ActiveDocument->{'ProtectionType'};
if ($i < 0) {print 'Not Protected'}
else {print 'Protected'}
 
Thanks for the code...I'll give OLE a try if a Word parser module doesn't show up. Someone created a handy module that'll parse binary Excel files through Perl, I guess I was hoping for something like that. But this'll do.

Thanks,

Nick
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top