Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

extracting data from PDF file into SAS

Status
Not open for further replies.

eguva

Programmer
Jun 23, 2007
27
0
0
US
Is there a way to read PDF files into SAS.
I saw a presentation online, which says about ghost script, but i am not sure what it emans and how to do it.
Can someone help me if you have idea of how to read a PDF file into SAS.

Thanks,
Eguva
 
Oooh, sounds interesting. I don't suppose you have a link to the presentation do you? I've never seen this done, so I can't help right away, but the idea sounds good, and I'd be keen to learn more myself.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
You would have to 'read' the text from a PDF file into a SAS dataset. You would then have to parse the dataset to get any data you want. You could use a tool called PDF2TXT ( I am not sure if they have a command line version, but this should work. This tool seems to use the iText pdf library (Java or .Net) to extract the text from a pdf file. I think that its more work to do it this way as you would have to have a massive program to parse the incoming data. The reason is that with text in a pdf spaces are not really spaces but positions on the paper (layout). So what you think is a space (column) may actually be only one whitespace away from what you believe to be the previous column. But hey, if you figure something out let me know.
Klaz
 
Thank you for the link.
I was planning to write a generic SAS macro which can compare 2 pdf files and probably produce an output dataset for the user to subset or post-process it to reduce the output size. But looks like there is nothing in SAS to read a PDF file. Even if i use this tool, not every user can use my macro unless he/she has this tool.

Thanks.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top