Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Replace special characters

Status
Not open for further replies.

redoakhg

Programmer
Nov 30, 2006
38
US
Hi,

I was provided a csv file with a list of affirmations for a project that I am working on.

The issue is, apparently the original file was generated in Word, pasted into Excel and output to csv. When I send text messages with the affirmations I am seeing a lot of odd characters i.e. "I am sick and tired of being sick and tired of smoking. I quit now.Â" (Note the  at the end of the string)

Is there a function that can be used to clean this up?
 
If you know each bad character, you can use REPLACE function in T-SQL to replace these characters with nothing.
 
I have a function I use that may help. Understand that the performance of this thing may not be that great, but if this is a once and done cleanup task, the performance shouldn't matter too much.

Code:
ALTER Function [dbo].[RemoveNonAlphaNumericCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
	Declare @AcceptableCharacters VarChar(100)

	Set @AcceptableCharacters = [!]'a-zA-Z0-9 .-'[/!]
	While PatIndex('%[^' + @AcceptableCharacters + ']%', @Temp collate Latin1_General_BIN
) > 0
		Set @Temp = Stuff(@Temp, PatIndex('%[^' + @AcceptableCharacters + ']%', @Temp Collate Latin1_General_BIN
), 1, '')

	Return @Temp
End

Notice the parts in RED. This code will remove any character(s) found that are not included in this list. As is, the only characters left in the original string should be a through z, A through Z, space, period and dash (-). You will likely want to modify the list of acceptable characters before putting this code in to production.

Using the function above, I tested like this:

Code:
Select dbo.RemoveNonAlphaNumericCharacters('I am sick and tired of being sick and tired of smoking.  I quit now.Â')

Which returned this...

[tt][blue]
I am sick and tired of being sick and tired of smoking. I quit now.
[/blue][/tt]


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Just to be clear. If you know the special characters you want to remove, use the replace method because it will perform better. If you don't know, and simply want to keep certain acceptable characters, use the function I provided above.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
And in case of the text type, you would convert to varchar(8000) first, right? So, we may want to use 8000 chars for parameters in UDF, correct?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top