Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Screen Scrape and C# special chars

Status
Not open for further replies.

Badgers

Programmer
Nov 20, 2001
187
US
Hi

I want to screen scrape a page in our website, replace some masks, and then spit it back out.

Here is an example of the initial code:

string s = System.Text.Encoding.UTF8.GetString(new System.Net.WebClient().DownloadData("
Obviously just using iii.co.uk as an example.

The problem is I get left with a load of white space and special characters - \n\r \" etc.

I don't want any of this, as it means I can reform and re-ouput the page.

Does anyone know of a way to do this, so you get get a clean html string, ready to be re-output to the browser.

An example is below is what I don't want

<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \" xmlns=\" xml:lang=\"en\" lang=\"en\">\n<head>\n\n<title>


Thanks
 
use regex to scrub the result of the webrequest.

Jason Meckley
Programmer
Specialty Bakers, Inc.

faq855-7190
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top