Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PDF to text file coming out funky

Status
Not open for further replies.

mike777

Programmer
Jun 3, 2000
387
US
Hello.
We're having a problem with converting a pdf file to text file. The pdf file is 7 pages long. We are using Adobe Acrobat to "Save As" to convert the file.
Everything is fine until we get to the last page. There it prints the first column on the last page on the first <<say>> 50 rows (in the text) file. Then the next 50 rows print the 2,3,4,5,6th columns from the data. The next 50 print the 7th column and the next 50 print the 8th.
This is confusing, I know. The pdf file looks like this.
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8
data1 data2 data3 data4 data5 data6 data7 data8
data1 data2 data3 data4 data5 data6 data7 data8
data1 data2 data3 data4 data5 data6 data7 data8
data1 data2 data3 data4 data5 data6 data7 data8
data1 data2 data3 data4 data5 data6 data7 data8
data1 data2 data3 data4 data5 data6 data7 data8

After conversion, the first six pages look fine. The 7th page looks like this:
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8
data1
data1
data1
data1
data1
data1

data2 data3 data4 data5 data6
data2 data3 data4 data5 data6
data2 data3 data4 data5 data6
data2 data3 data4 data5 data6
data2 data3 data4 data5 data6
data2 data3 data4 data5 data6

data7
data7
data7
data7
data7
data7

data8
data8
data8
data8
data8
data8

I don't know how the formatting of this thread is going to work out when I actually post it. It looks good right now..please understand if it gets hacked up.

Let me tell you this, also. Very strange. When we saved the file as an RTF file and opened it in MS Word, it looks OK. The last page, however looks like a Word table. I don't know if any of this means anything to anyone, but I thought I'd give it a try.

I'm thinking that the person that creates this pdf file (a third party) outputs the first x number of pages as lines of text. Then, for some reason, when they get to the last page, they decide to output it as a table. Just a theory. Why, though, Acrobat is taking this table and hacking it all up when it converts it to a text file is yet another question.

Anything you can do to help is very much appreciated.

-Mike
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top