csteinhilber
Programmer
We're running a content management system on a Solaris web server (iPlanet). The CMS is built to cache pages off once all the elements are assembled from a database (essentially a static page at that point). The caching occurs on the first page hit to that URL. What we're looking to do is build up the cache automatically, as much as possible, so that the first user to the page doesn't incur the performance hit.
We've already built the hooks that would be able to fire off the cache building process... the question is what to use during the process itself.
Right now, we've started with wget. We assemble a list of pages that need to be hit (cached), and pass that to wget via the -i or -input-file= argument. It's working alright... but it's darned slow. So we're looking for alternatives.
We really don't need something that will save the downloaded file, nor do we need it to traverse (spider) links within the page (or process the download in any way, actually)... in fact, if it didn't need to actually download the file, all the better. All we need it to do is pretend to be a browser (being able to set a particular user agent is a must) and hit a given URL in a list, in order for the CMS to do it's thing and cache the page.
Anybody know of any possible solutions/alternatives to wget for this process? While httrack is multi-threaded and probably somewhat faster at downloads than wget, it's actually slower for us because it processes the downloaded page for other links to traverse (and it doesn't appear that you can simply pass it a list of URLs, like you can wget).
Any comments/input would be greatly appreciated.
TIA!
-Carl
We've already built the hooks that would be able to fire off the cache building process... the question is what to use during the process itself.
Right now, we've started with wget. We assemble a list of pages that need to be hit (cached), and pass that to wget via the -i or -input-file= argument. It's working alright... but it's darned slow. So we're looking for alternatives.
We really don't need something that will save the downloaded file, nor do we need it to traverse (spider) links within the page (or process the download in any way, actually)... in fact, if it didn't need to actually download the file, all the better. All we need it to do is pretend to be a browser (being able to set a particular user agent is a must) and hit a given URL in a list, in order for the CMS to do it's thing and cache the page.
Anybody know of any possible solutions/alternatives to wget for this process? While httrack is multi-threaded and probably somewhat faster at downloads than wget, it's actually slower for us because it processes the downloaded page for other links to traverse (and it doesn't appear that you can simply pass it a list of URLs, like you can wget).
Any comments/input would be greatly appreciated.
TIA!
-Carl