Library needs help - Extracting ISBN from web page

sromine · Oct 25, 2006

Hello,

I work for a library and as a side project I am trying to put together a project to make our library catalog more useful by mashing it up with Amazon.com. I have posted this into several forums not knowing which would be the best forum or the best language to use. Thank you to anyone who can get me started on this script. I know just enough programming to modify what others have created. If you need more info, please let me know.

What I am looking for is some help with code whereby I feed in a url and then it isolates the ISBN number into a variable and then adds it to the end of a url.

Here is a sample url where the isbn would need to be extracted.

http://web2.co.douglas.or.us/web2/t...vers=1home&index=default&query=0307276902

Here is an example of that isbn at the end of a url to access info on amazon..

http://www.amazon.com/gp/product/0307276902

KevinADC · Oct 25, 2006

the best way would be to use whatever your current catalog software uses.

- Kevin, perl coder unexceptional!

stevexff · Oct 26, 2006

Perhaps I'm being a little dense here, but the first URL you posted already has the ISBN as part of the query string. If you already have it, why do you need to 'extract' it?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

sromine · Oct 26, 2006

Thanks for your quick response. The url is just an example in order to get to the data I need extracted if possible. If you click on the url, you will see the ISBN # I am talking about....

The issue is that the we catalog is proprietary, and the company does not have very good documentation to tap into the underlying data, mainly because they sell a product (for thousands of dollars) that enhances the catalog with additional data.

What I am trying to do is build a seperate web catalog page that is a mash up between our catalog and amazon.com One way to do this is to try and screen scrape the isbn #.

Hope this clears it up a little....would appreciate any help anybody might be able to provide.

stevexff · Oct 26, 2006

Can I make the following observations:

[ol][li]Screen scraping is a nightmare. If you can, look at the raw data. Has your vendor written their own DBMS? I doubt it. It's more likely to be a relational DBMS of some sort. There are plenty of tools you can use to interrogate the DB schema to see what it looks like, perl included (DBI module).[/li][li]Screen scraping a third party is even more of a nightmare, because they will change their formatting regularly at the whim of the marketing department, and they certainly won't tell you in advance. Every few weeks, suddenly your application is broken again. Amazon have some public web services that you can use to inquire on their products. The results come back in XML, and the formats don't change. AFAIK they are free, but you may need to create a (free) account to register. Perl has modules for calling and consuming web services, and they are simple to use.[/li][/ol]

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

KevinADC · Oct 26, 2006

Excellent points Steve.

- Kevin, perl coder unexceptional!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Library needs help - Extracting ISBN from web page

sromine

Technical User

KevinADC

Technical User

stevexff

Programmer

sromine

Technical User

stevexff

Programmer

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor