Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HTML Class!

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hello all,

Here is my requirement..,

My program should be able to save the webpage (including images, style sheets )in local drive by taking a URL of any website.

For this, I will get the content by using java.net.URL class. After getting the content, i have to search for any framesets. if framesets are exists, I have to get the content of them also. Next i have to search for any images. After getting the html content and images, i have to replace the image, frameset 'src' attributes in html content point to local drive and store them in local drive.

After getting every thing, if anybody opens that webpage from my local drive, it should not be contacted to the original site. Each and every entity I should store in local drive and repalce the entity names to point to local ones.

Is there any Java Class to achieve this functionality?

For eg: In browser after loading the webpage, JavaScript builds Document Object Model. It automatically creates window, document, location, anchor, img, and form Objects. Then it is very easy for me to chnage the 'src' attribute of img and frame tags.


Waiting for your valuable replay with thousands of eyes..

Thanks in advance,
V.Thandava Krishna.



 
It is easiest to do this by writing a proxy server.
Then just let a browser do all parsing. Otherwise
this will be a huge task. You will need to interpret
javascript and whatever else might be on the page in
order to get all the content.

One common pit fall for your first proxiing server is
to remember to chop the server name from the request

ex
Browser requests
you open a connection to and do a GET abc.gif

You'll see what I mean once you get it breathing.
This is a cool thing to learn cause you will see
the transactions between browser and server first
hand char by char.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top