Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regex to look for missing string

Status
Not open for further replies.

RickBeddoe

Programmer
Nov 18, 2008
32
0
0
US
Hello Folks,

I have an HTML file that I'm processing that contains comments. Unfortunately, the application that generates this HTML file, malforms the comments. We are working to get the developer of this tool to fix the problem, but in the meantime I'd like to develop a workaround. The malformed comment looks like this:

<!-- Some Comment <some tag>

I'd like to create a regex to look for lines containing the comment and missing "-->", then I'd like it to delete the malformed comment.

Any suggestions would be appreciated.

 
In the event someone runs across this later, here's how I did it:

public static string cleanupHtml(string sourceHtml)
{
string temp = System.IO.Path.GetTempPath();
TextReader tr = new StreamReader(sourceHtml);
FileStream fout = new FileStream(temp + "replaced.html", FileMode.Create, FileAccess.Write);
StreamWriter sw = new StreamWriter(fout);


Regex rgx = new Regex(@"<!--");
Regex rgxS = new Regex(@"-->");

string t;
//string u;
while ((t = tr.ReadLine()) != null)
{
if (rgx.IsMatch(t))
{
if (!rgxS.IsMatch(t))
{
//Console.WriteLine(t.Replace("</div>", " comment --></div>") + " " + rgxS.IsMatch(t));
t = t.Replace("</div>", " comment --></div>");
sw.WriteLine(t);
}
}
sw.WriteLine(t);

}
tr.Close();
fout.Close();
return temp + "replaced.html";
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top