Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing email address from Text

Status
Not open for further replies.

NEveritt

MIS
Dec 30, 2002
25
0
0
GB
I am looking to remove email address from text strings and replace it with the text (email removed). The email address' will not be in every string and the the domains will vary.

e.g.

fields named 'Comments'

and an example of a field is:

"what a sunny day said email@email.com yesterday"

i want to return:

"what a sunny day said (email removed) yesterday"

can anyone help me out with this?
 
You'll need to create a custom function.

You may want to take advantage of built-in functions like STUFF, SUBSTRING, LEN, REPLACE, SPACE (more available in the help section under 'string functions') within your function, to avoid looping through the string.

Or, you may want to loop through the string, find an @ symbol, note it's position, note the position of the closest 'space' characters (the one before the e-mail and the one after), then use the STUFF function with the space positions.

Hope that helps some,
Jason
 
Here is some code that works, and is in crude form, but probably worthy of turning into a user-defined-function that takes the string as input and returns the redacted string as output:

Code:
declare @str varchar(255)
set @str='what a sunny day said email@email.com yesterday'

declare @email varchar(255)
declare @AtIndex int, @StartIndex int, @EndIndex int

-- Get @ sign char pos
set @AtIndex = charindex('@', @str)
if @AtIndex>0
 begin

   -- Get end char position
   set @EndIndex = @AtIndex
   while @EndIndex<=len(@str) and 
         substring(@str, @EndIndex, 1)<>' ' and
         substring(@str, @EndIndex, 1)<>',' and
         substring(@str, @EndIndex, 1)<>';' and
         substring(@str, @EndIndex, 1)<>')' and
         substring(@str, @EndIndex, 1)<>'!'
     begin
         set @EndIndex = @EndIndex + 1
     end

   -- Get start char position
   set @StartIndex = @AtIndex
   while @StartIndex>=1 and
         substring(@str, @StartIndex-1, 1)<>' ' and
         substring(@str, @StartIndex-1, 1)<>',' and
         substring(@str, @StartIndex-1, 1)<>';' and
         substring(@str, @StartIndex-1, 1)<>'(' and
         substring(@str, @StartIndex-1, 1)<>'!'
     begin
         set @StartIndex = @StartIndex - 1
     end
   
   -- Get email, set replacement str
   set @email = substring(@str, @StartIndex, @EndIndex-@StartIndex)
   set @str = replace(@str, @email, '(email removed)')
 end

-- select out results
select @str
 
Oh, and a great enhancement to the above is to make sure that the last 4 characters of the @email found end in one of the expected domains (.org, .net, .com, etc...) and only do the replacement if they do.

TJR
 
I always like to see a set based (per say - lol) answer that does not use loops or cursors. Here is a function I "crudely" threw together. Run the code in Query Analyzer to create the UDF (User-Define Function).

Code:
CREATE  FUNCTION fnc_ReplaceEmailAddress
( @OldEmailText VarChar(1000) )
RETURNS varchar(1000)
AS
BEGIN
  BEGIN 
	DECLARE @AtSignPos		Int
	DECLARE @EmailStartPos		Int
	DECLARE @EmailEndPos		Int

	DECLARE @PullEmailOut		VarChar(1000)

	DECLARE @RevisedEmailText	VarChar(1000)

	SET	@AtSignPos = CharIndex('@',@OldEmailText)

	SET	@EmailStartPos = (@AtSignPos - CharIndex(' ',REVERSE(SUBSTRING(@OldEmailText,1,@AtSignPos)))) + 2

	SET	@EmailEndPos = (CharIndex(' ',SUBSTRING(@OldEmailText,@AtSignPos,LEN(@OldEmailText))) - 1) + @AtSignPos

	SET	@PullEmailout = SUBSTRING(@OldEmailText,@EmailStartPos,(@EmailEndPos - @EmailStartPos))

	SELECT	@RevisedEmailText = REPLACE(@OldEmailText,@PullEmailOut,'(Email Removed)')

  END

RETURN @RevisedEmailText

END

To execute it ... run the following snippette in Query Analyzer against your table. Of course change the field and table names up.

Code:
SELECT dbo.fnc_ReplaceEmailAddress(EmailAddress)
FROM MyTable

Enjoy!

Thanks

J. Kusch
 
I believe Sql Server supports Regular Expressions. A pattern string something along the lines of \b.+@+*\b is a beginning. I'm not an expert but that says get anything within word boundaries that contains an @ with at least one character before and after it. Check out and some of the links there. He may have an example. If you look on MSDN, I also think I remember they gave an email extraction example as part of the Regular Expressions help. Good Luck!

Have a great day!
 
J. Kusch, that looks like my first version which used NO while loops and also used reverse.

But alas, I dismissed that a valid solution as it only works for strings with email addresses that are bounded by space characters and not for strings with email addresses bounded by other characters (punctuation, etc), or those that might be the only text within a string.

This really is something that has to be done by a regular expression parser...if SQL has one, I have never used it.

TJR
 
Sorry, I posted to quickly. The above example will not work with lower case email addresses. The following ones will:
Code:
\b[a-zA-Z0-9._%-]+@[a-zA-Z0-9._%-]+\.[a-zA-Z]{2,4}\b 
 
\b(?:mailto:)?([a-zA-Z0-9._%-]+@[a-zA-Z0-9._%-]+\.[a-zA-Z]{2,4})\b


Have a great day!
 
all sorted now, i edited the query given by JayKusch to remove the full email address, plus I had to cope with text that did not have an email address:

CREATE FUNCTION fnc_ReplaceEmailAddress
( @OldEmailText VarChar(1000) )
RETURNS varchar(1000)
AS
BEGIN
BEGIN
DECLARE @AtSignPos Int
DECLARE @EmailStartPos Int
DECLARE @EmailEndPos Int

DECLARE @PullEmailOut VarChar(1000)

DECLARE @RevisedEmailText VarChar(1000)

SET @AtSignPos = CharIndex('@',@OldEmailText)

SET @EmailStartPos = (@AtSignPos - CharIndex(' ',REVERSE(SUBSTRING(@OldEmailText,1,@AtSignPos)))) + 2

SET @EmailEndPos = (CharIndex(' ',SUBSTRING(@OldEmailText + space(1),@AtSignPos + 1,2000)) ) + @AtSignPos

SET @PullEmailout = SUBSTRING(@OldEmailText,@EmailStartPos,(@EmailEndPos - @EmailStartPos))

SELECT @RevisedEmailText = case when CharIndex('@',@OldEmailText) > 0 then REPLACE(@OldEmailText,@PullEmailOut,'(Email Removed)') else @OldEmailText end

END

RETURN @RevisedEmailText
return @emailendpos

END


--------

SELECT dbo.fnc_ReplaceEmailAddress(comments) ,
comments,
CharIndex('@',comments) ,
(23 - CharIndex(' ',REVERSE(SUBSTRING(comments,1,23)))) + 2,
(CharIndex(' ',SUBSTRING(comments,23,LEN(comments))) ) + 23,
(CharIndex(' ',SUBSTRING(comments + space(1),24,2000)) )+ 0,
SUBSTRING(comments,17,((23-17) + 20))

FROM Casehistories
where Casehistories.caseid = 539560 and actionid in (9,10)--and casehistoryid = 4117029


cheers
 
Glad I could help w/ a template. Thanks for posting the overall solution for the rest of our reader too!

Thanks

J. Kusch
 
NEveritt, why didn't you go with the SBendBuckeye recommendation of sp_regexp (regular expression parser)? I ask because I would have assumed that the limitation/requirement that the email address be surrounded by spaces be too restrictive....but maybe not.

TJR
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top