Stretchwickster
Programmer
I have some HTML code in a TRichEdit and I want to strip out all the tags to leave the text I am interested in. For example, filtering this HTML code:
would give the following as output:
Here is the Delphi code I have so far:
Unfortunately, I had to put a limit on how many tags it removes because it seems to mess up when it finds the 190th tag! The code works as required up to this point. Another problem is that lots of whitespace is still floating around after doing this. Btw, the text is about 61,000 characters over 1300 lines.
Any help would be much appreciated!
Clive
Ex nihilo, nihil fit (Out of nothing, nothing comes)
Code:
<HTML>
<HEAD>
<TITLE> My Site </TITLE>
</HEAD>
<BODY>
<B> Lots of useful information </B>
<H1> And some more </H1>
</BODY>
</HTML>
Code:
My Site
Lots of useful information
And some more
Here is the Delphi code I have so far:
Code:
startPos := 0;
lineNo := 0;
with richEditHTML do
begin
textLen := Length(richEditHTML.Text);
repeat
beginFound := richEditHTML.FindText('<', startPos, textLen, []);
if beginFound <> - 1 then
begin
startPos := beginFound;
textLen := textLen - startPos;
endFound := richEditHTML.FindText('>', startPos, textLen, []);
SelStart := beginFound;
SelLength := (endFound - beginFound) + 1;
SelText := '' + #13#10;
SelStart := SelStart + 1;
Inc(lineNo);
end;
until (beginFound = -1) OR (lineNo = 189);
end;
Any help would be much appreciated!
Clive
Ex nihilo, nihil fit (Out of nothing, nothing comes)