Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Convert PDF File with 2 records per person to Excel 2

Status
Not open for further replies.

IngDep

MIS
Feb 7, 2008
89
US
Insight Needed.

Twice a week, the analyst manually needs to type in data to an MS Excel worksheet from a 15 page PDF file.

Data on the PDF file is arranged in two records per customer. For example, on the first line are Name, Birthdate, Account Number, Sales Rep. On the second line are Order Date, Receipt Date, Facility Date, Phone.

Is there a vba solution to this issue or a recommended inexpensive software that can be used to make this a easier task?

Currently experimenting with saving the PDF file as text and then using Excel's Import Text Wizard. Also, have heard mention of a software program such as "Monarch."
 




Are these data items actually separated by COMMAS? Like...
[tt]
Skip Vought, 1/1/1900, 123456, Some O Guy
[/tt]
Please post in Forum707 for VBA answers.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Delimiters are a combination of comma, spaces and hyphens with data spread over two lines for each customer.

Most of the fields have a space as a delimiter. The name field is in the format of "LastName, FirstName MiddleName."

 




Please post a representative sample of data and how you want to see the results.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Data is similar to the following:

Two Heading Lines:
LastName,FirstName MiddleName---BirthDate--AcctNo--SalesRep
--OrderDate---ReceiptDate-------FacilityName---DiagCode

Data Example - Two Lines for each record:
Smith, Larry Joe----04/15/2007--1st - InsTTA------WillaimTA
--05/16/08-----05/18/08---------ToTjIUM--------Headache

The account number has a hyphen. Also, there are breaks throughout the PDF file with different sections. For example, the first section is titled "BR" and the second section is titled "INP" and so on. There is a blank line at the end of each section.

Note, the hyphens above represent spaces, but not the actual spaces, between the data on the PDF Report.
 




"Note, the hyphens above represent spaces, but not the actual spaces, between the data on the PDF Report. "

I do not need a representation of the data.

Rather I need to see EXACTLY WHAT YOU HAVE.

Please COPY 'n' PASTE without any comments inserted.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
PATIENT NAME SEX BIRTHDATE INSURANCE FOUND POLICY NUMBER REPRESENTATIVE
ACCOUNT NUBMER ADM DATE DIS DATE FACILITY NAME DEPART DIAGNOSIS
ER - EMERGENCY ROOM (Outpatient)
SMITH,JOHN MANUEL M 06/15/2008 1st Ins - NMISTAR 28993465 MARY WISEN
DD00994588 01/15/08 01/20/08 ABC Hospital INDIGESTION, URI
 
When you save the PDF file to a text file can you click the "Save as Type" DropDownListBox to see if it offers a comma delimited or tab delimited txt file format?

If the answer yes the above should be quite easy.

sam
 



It will be a nearly impossible task without field delimiters.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Field delimiters are spaces.

It appears that the most challenging aspect of this task is that there are two lines per patient.

I would like to convert from a two lines per patient PDF format to a one line per patient in MS Excel.

Upon saving the PDF as a text file and then opening in MS Excel, the data is within one column and "stacked" - data overlaps. In other words, Name, Birthdate, Account No., etc. is all within the leftmost column of the worksheet.
 



"Field delimiters are spaces."

And so are the characters between words within a field.

THEREFORE, spaces are NOT delimiters!!!

"It appears that the most challenging aspect of this task is that there are two lines per patient.

I'd say that the most challenging aspect of this task is that there are NO delimiters to identify distinct fields.

Your last statement is self evident.





Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Skip is absolutely correct (he usually is), there are no delimeters between distinct fields.

The "space" (if used as a field delimiter) in:
SMITH,JOHN "space" MANUEL
would be a nightmare to distinguish fields from:
SMITH, JOHN "space" B. "space" MANUEL or
SMITH, JOHN "space" MANUEL "space" Jr. "space" Esq.

The reason all the data is stacked in col A is because there are 5 CRLF's (carriage return line feed)in the data,

one after Representative
one after Diagnosis
one after (outpatient) and
one after WISEN

Therefor you get 5 lines of data each treated a single field.

I only have Adobe PDF reader so I can't experiment with "save as".

Are you certain you can't save as a "comma delimited" text file and maybe omit the headers?

If you can at least do that, it would be easy to fix the CRLF problem and the fields would fix themselves.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top