You know those sites out there such as pricegrabber.com and bestbookbuys.com. You know, the ones that pull information such as prices and shipping and availability etc from other websites and then compare them against each other to find the best price etc. I’m wondering how this is done?
Now, I know that XML and RSS are used to create “news aggregators”; however, pricegrabber and similar website I don’t believe are using any sort of RSS feed to dynamically pull new content from the sites they are comparing.
So, my question is this, how are they actually generating this information? In other words, how are they getting the information? Are they using networking sockets to connect to a website, pull the html document via HTTP, and then parse the document to get the information? Sort of like a web crawler?
Any guidance as to the method being used by these types of websites would be much appreciated.
Now, I know that XML and RSS are used to create “news aggregators”; however, pricegrabber and similar website I don’t believe are using any sort of RSS feed to dynamically pull new content from the sites they are comparing.
So, my question is this, how are they actually generating this information? In other words, how are they getting the information? Are they using networking sockets to connect to a website, pull the html document via HTTP, and then parse the document to get the information? Sort of like a web crawler?
Any guidance as to the method being used by these types of websites would be much appreciated.