I'm spiking a proof of concept that will extract a paragraph of text from a MS Word document. Having no experience with Office Interops I have been able to piece together the following
which works except that it selects all the text. a typical document looks like this
[tt]
bolder header text
+------------------------+
| |
| |
| |
| Image |
| |
| |
| |
+------------------------+
A paragraph of text....
[/tt]
or
[tt]
bolder header text
A paragraph of text....
+------------------------+
| |
| |
| |
| Image |
| |
| |
| |
+------------------------+
[/tt]
selecting the entire document returns
[tt]bolder header text[]A paragraph of text....[/tt] or [tt]bolder header textA paragraph of text....[][/tt]
I would like to find a way to just return
[tt]A paragraph of text....[/tt]
I realize I need to determine which paragraph to select, that's another problem I can solve. right now I just want to select a paragraph, not the entire document. in fact parsing the text as an array of strings would be good. something like
but Paragraph doesn't appear to have a Select method.
Jason Meckley
Programmer
Specialty Bakers, Inc.
faq855-7190
faq732-7259
Code:
object missing = Missing.Value;
object doNoSaveChanges = WdSaveOptions.wdDoNotSaveChanges;
object originalFormat = WdOriginalFormat.wdOriginalDocumentFormat;
Application msword = null;
try
{
msword = new Application {Visible = false};
Document document = null;
try
{
object filename = "path to file";
object readOnly = true;
object addToRecentFiles = false;
document = msword.Documents.Open(ref fileName, ref missing, ref readOnly, ref addToRecentFiles,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing);
document.Select();
return document.ActiveWindow.Selection.Text;
}
finally
{
if (document != null)
{
document.Close(ref doNoSaveChanges, ref originalFormat, ref missing);
}
}
}
finally
{
if (msword != null)
{
msword.Quit(ref doNoSaveChanges, ref originalFormat, ref missing);
}
}
[tt]
bolder header text
+------------------------+
| |
| |
| |
| Image |
| |
| |
| |
+------------------------+
A paragraph of text....
[/tt]
or
[tt]
bolder header text
A paragraph of text....
+------------------------+
| |
| |
| |
| Image |
| |
| |
| |
+------------------------+
[/tt]
selecting the entire document returns
[tt]bolder header text[]A paragraph of text....[/tt] or [tt]bolder header textA paragraph of text....[][/tt]
I would like to find a way to just return
[tt]A paragraph of text....[/tt]
I realize I need to determine which paragraph to select, that's another problem I can solve. right now I just want to select a paragraph, not the entire document. in fact parsing the text as an array of strings would be good. something like
Code:
foreach(var p in document.Paragraphs)
{
p.Select();
yield return document.ActiveWindow.Selection.Text;
}
Jason Meckley
Programmer
Specialty Bakers, Inc.
faq855-7190
faq732-7259