Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Stuck - need to remove HTML tags and text either in SQL or using a formula within the report

Status
Not open for further replies.

ratzo02

Programmer
Jun 28, 2012
4
US
Using BO Crystal Report XI. I have a column of text that includes all the HTML tags. I need to remove those tags from the field. I started by using "replace(replace(replace(replace........" formula, but the tags change with different languages, I had a formula that was extremely long. What I would like to do, either with a formula or in SQL, is create (what should be simple, but I can't come up with it) a formula that remove "<" and all the text between ">".
This formula would have to be able to repeat multiple times.

I found and tried this:
if right({YourField},1) = "]" then
left({YourField},instr({Yourfield},"[")-1)
else
{YourField}


But BO XI doesn't use the "instr" operator.
Any suggestions?
 
Instead of instr(), look at the pos() function that works like instr().

If you have access to create functions in the database, this might be more efficient if you create a stored function that will do this for you in the database instead of doing it in Crystal. I've been able to do something similar in Oracle using regular expressions which would potentially eliminate your need to loop through the data. Then, if your report is just using tables, you would create a SQL Expression that calls the function with field that you want to strip. If your report uses a command, you can just call the function in the select statement of your command.

-Dell

DecisionFirst Technologies - Six-time SAP BusinessObjects Solution Partner of the Year
 
I agree Dell about having the DB do this work. And that's the plan in a few months (moving data to Netezza) but in the short term I stuck trying to make this work.
We are alot closer, at least I can make BO except the new formula, but now it looks for the last > and whipes out everything if the last character is > (there is a character limit on this field).
Here is the Formula I've started with, and have been tweaking, but so far its all (everything removed) or none (nothing removed)*except*:
=If Right([Case Text];1) = ">" Then
Left([Case Text];Pos([Case Text];"<")-1)
Else [Case Text]​
*except*:
=If Right([Case Text];1) = ">" Then
Left([Case Text];Pos([Case Text];"<")+1)
Else [Case Text]​
This leaves the first < and character of the entire field as expected. I need to come up with a way for it to see the first "<" then go the the very next ">" and delete everything in between. IF it doesn't have a "<" infront of it, skip over it.
Any thoughts?

Thanks awesome start Dell
-Kurt
 
You could try something like this:
Code:
Numbervar nStart := 0;
Numbervar nEnd := 0;
StringVar sProcess := {Your Field};
StringVar sResult := '';

nStart := pos(sProcess, '>');
While nStart > 0 do
(
  nEnd := pos(sProcess, '<');
  sResult := sResult + left(sProcess, nStart - 1) + ' ';
  sProcess := mid(sProcess, nEnd + 1);
  nStart := pos(sProcess, '>');
)
RTrim(sResult)

This should walk through the whole string and remove everything that starts with "<" and ends with ">".

-Dell

DecisionFirst Technologies - Six-time SAP BusinessObjects Solution Partner of the Year
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top