Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how do I get data from a pdf file which is on a website.

Status
Not open for further replies.

Premalm

Programmer
Mar 20, 2002
164
US
hi Guys,

Here's what I need to do.
I need to login to a website. Open a particular link. Once I click it, it opens a pdf file. I need to get some data from the pdf file which in the form of a table.
This all has to be done programmatically using visual basic.

Is there any way to do this ? Is there any particular software which I need to use.

Thanks
Premal
 
I was just asking around work to find out the exact same thing to be told it cannot be done so, with the exuberance of youth and a determination to prove them all wrong I searched Tek-Tips for an answer but judging by the lack of response I guess I amy have to concede defeat :eek:(

Begs the question "What is the benefit of a PDF?
 
No you cannot import data from a PDF file - the only way to get data is by cutting and pasting from the doc [which is not supported by all PDF writers]

The question why PDF is a question of compatbility and consistency. PDF's can be created from nearly all software[PDF writer installs a printer device]

Once a PDF is written it cannot be 'edited'.

This means that whoever opens the document - on what ever platform - they will always see the same. Think of PDF as an image of a document.

Hope this gives you some insight.......

Trancemission
=============
If it's logical, it'll work!
 
The benefit of PDF is precisely the reason why you can not do what you want to do. PDF was conceived to STOP people from doing exactly what you want to do. It is to STOP people from getting data/text from documents. That is the purpose of PDF - plus a "standard" document print output. In theory, if you have a PDF printer driver, all prints of the PDF will look the same. regardless of application and hardware.

Trancemission is bang on...PDF are images of files. The "real" data/information is not present, only a picture of it.

Gerry
 
Fair enough I guss I was thinking specifically about the pdf I was trying to get data from (an official publication) and why they would want to stop people reading data from it. I can see that commercial sites might want to prevent people screenscraping but see no benefit here
 
You are misconnecting "reading data", which of course they CAN do, and "getting data". Someone looking at the PDF is, in fact, reading the data, literally. They just can not GET the data.

At least not easily, which is the point.

Gerry
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top