Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Email address validation 1

Status
Not open for further replies.

Zeroanarchy

Technical User
Jun 11, 2001
630
AU

Hi,

I have been searching Tek-tips for a method to validate an email address that contains only numeric values before the @ symbol and after the @ symbol in a select query.

I was hoping that a method existed where by if only numeric values appeared before or after the @ symbol I could remove them from the list.

From what I understand Oracle does not have a method to check for only numeric.

I was wondering if instead one was able to sum the values before or after the @ symbol and if an alpha character existed within the sum then an error would occur thus returning '0' allowing me to eliminating the values > than '0'.

I have tried
length(translate(T_MEMBERSHIP_DETAILS.ADDR_EMAIL,'A0123456789','A'))is not null

but this only seems to work on values that do not contain the @ symbol.

Any ideas appreciated.


[afro]ZeroAnarchy
Experience is a wonderful thing. It enables you to recognize a mistake
when you make it again.

 
Zero,

You are very close. Here are sample data and code that does what you want (I believe):
Code:
select * from T_MEMBERSHIP_DETAILS;

ADDR_EMAIL
---------------
abcdefg@hijklmn
123456@hijklmn
123456@123456
abcdefg@123456

4 rows selected.

select addr_email invalid_emails
  from T_MEMBERSHIP_DETAILS
 where instr(addr_email,'@') = 0 -- invalid if no "@"
/* Next "OR" invalidates all-numeric strings preceding "@" */
    or length(translate(substr(addr_email,1,instr(addr_email,'@')-1),'A0123456789','A')) is null
/* Next "OR" invalidates all-numeric strings following "@" */
    or length(translate(substr(addr_email,instr(addr_email,'@')+1),'A0123456789','A')) is null;

INVALID_EMAILS
----------------
123456@hijklmn
123456@123456
abcdefg@123456

3 rows selected.
Let us know if this does what you wanted.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Hey Mufasa and thanks again for your help.

If I revert the code you supplied, as follows to only display the values which are valid, I still get emails in the list which are numeric.

Example.
185646@lizzy.com.au
118350@lizzy.com.au
102-2102@hotmail.com
22222@2222.

I can certainly run extra filters in the Where Clause but I am wondering why these once have slipped by.

Thanks

Code:
select 	T_MEMBERSHIP_DETAILS.ADDR_EMAIL invalid_emails
  from T_MEMBERSHIP_DETAILS
 where 

		/* Next "OR" invalidates all-numeric strings preceding "@" */
		     length(translate(substr(T_MEMBERSHIP_DETAILS.ADDR_EMAIL,1,instr(T_MEMBERSHIP_DETAILS.ADDR_EMAIL,'@')-1),'A0123456789','A')) is not null
		/* Next "OR" invalidates all-numeric strings following "@" */
		    or length(translate(substr(T_MEMBERSHIP_DETAILS.ADDR_EMAIL,instr(addr_email,'@')+1),'A0123456789','A')) is not null

ORDER BY
	T_MEMBERSHIP_DETAILS.ADDR_EMAIL

[afro]ZeroAnarchy
Experience is a wonderful thing. It enables you to recognize a mistake
when you make it again.

 
Zero,

Remember, if you want to "reverse" logic then that typically means:

1) changing the "OR" to an "AND",
2) changing "is NULL" to "is NOT NULL" (as you have done)
3) changing "= 0" to "> 0":
Code:
select addr_email valid_emails
  from T_MEMBERSHIP_DETAILS
 where instr(addr_email,'@') > 0
   and length(translate(substr(addr_email,1,instr(addr_email,'@')-1),'A0123456789','A')) is not null
   and length(translate(substr(addr_email,instr(addr_email,'@')+1),'A0123456789','A')) is not null;

VALID_EMAILS
---------------
abcdefg@hijklmn

1 row selected.
Let us know if this is what you wanted.


[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Thanks Mufasa works well. I do have a couple more questions.

Is it possible to remove all email addresses that are numeric with a character in them as well as email addresses with invalid charaters.

Example:
122343.45454@test.com
12.4565@test.com
abc%wesd@test.com
+1212dfd@test.com
[fesdsr343421@test.com

I have tried the following two methods without result. Is there a better way??

Thanks again.

Code:
	AND (INSTR(ADDR_EMAIL,'!') = 0
	OR INSTR(ADDR_EMAIL,'#') = 0
	OR INSTR(ADDR_EMAIL,'$') = 0
	OR INSTR(ADDR_EMAIL,'^') = 0
	OR INSTR(ADDR_EMAIL,'&') = 0
	OR INSTR(ADDR_EMAIL,'*') = 0
	OR INSTR(ADDR_EMAIL,'(') = 0
	OR INSTR(ADDR_EMAIL,')') = 0
	OR INSTR(ADDR_EMAIL,'+') = 0
	OR INSTR(ADDR_EMAIL,'?') = 0
	OR INSTR(ADDR_EMAIL,',') = 0
	OR INSTR(ADDR_EMAIL,';') = 0
	OR INSTR(ADDR_EMAIL,'{') = 0
	OR INSTR(ADDR_EMAIL,'}') = 0
	OR INSTR(ADDR_EMAIL,'\') = 0
	OR INSTR(ADDR_EMAIL,'`') = 0
	OR INSTR(ADDR_EMAIL,'~') = 0
	OR INSTR(ADDR_EMAIL,'=') = 0
	OR INSTR(ADDR_EMAIL,'<') = 0
	OR INSTR(ADDR_EMAIL,'>') = 0
	OR INSTR(ADDR_EMAIL,'}') = 0
	OR INSTR(ADDR_EMAIL,']') = 0
	OR INSTR(ADDR_EMAIL,'[') = 0)

Code:
AND	(ADDR_EMAIL 	NOT LIKE '%!%'
OR	ADDR_EMAIL 	NOT LIKE '@%'
OR	ADDR_EMAIL 	NOT LIKE '%#%'
OR	ADDR_EMAIL 	NOT LIKE '%$%'
OR	ADDR_EMAIL 	NOT LIKE '%^%'
OR	ADDR_EMAIL      NOT LIKE '%&%'
OR	ADDR_EMAIL 	NOT LIKE '%*%'
OR	ADDR_EMAIL 	NOT LIKE '%(%'
OR	ADDR_EMAIL 	NOT LIKE '%)%'
OR	ADDR_EMAIL 	NOT LIKE '%+%'
OR	ADDR_EMAIL 	NOT LIKE '%?%'
OR	ADDR_EMAIL 	NOT LIKE '%,%'
OR	ADDR_EMAIL 	NOT LIKE '%;%'
OR	ADDR_EMAIL 	NOT LIKE '%{%'
OR	ADDR_EMAIL 	NOT LIKE '%}%'
OR	ADDR_EMAIL 	NOT LIKE '%\%'
OR	ADDR_EMAIL 	NOT LIKE '%`%'
OR	ADDR_EMAIL 	NOT LIKE '%~%'
OR	ADDR_EMAIL 	NOT LIKE '%=%'
OR	ADDR_EMAIL 	NOT LIKE '.%'
OR	ADDR_EMAIL 	NOT LIKE '%.'
OR	ADDR_EMAIL 	NOT LIKE '%<%'
OR	ADDR_EMAIL 	NOT LIKE '%>%')

[afro]ZeroAnarchy
Experience is a wonderful thing. It enables you to recognize a mistake 
when you make it again.
 
Sure...

A much more efficient, simpler method is:
Code:
...AND length(translate(addr_email,
                        'A!#$^&*()+?,;{}\`~=<>[]','A')) =
       length(addr_email)...
Let us know if this meets with your satisfaction.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
I thought I would save this email filtering query in here for the next person who needs to clean up email addresses.

I had to come up with a method of filtering because the company I am working for is due to send out around 100,000 emails, each email costs the company 6C. So I guess the more dubious email addresses I get rid of the less it will cost the company.

Evidently the email addresses will be checked again after this with a finer filter but there is no point supplying the business with a list of 100,000 when it should be close to 80,000.

I discovered whilst looking through 100,000 emails that the corruption tended to be at the start of the list (Numeric and the letter A) and the end of the list (z) with the inclusion of the letter N.

Examples:
1212@test.com
23232-4454@test.com
23232.454@test.com
aaaabbbbddfdf@test.com
noemail@test.com
none@test.com

Some issues relate around symbols being used as well as white spaces.

Now this filter does not clean out all incorrect email addresses, but it should get rid of a few.

Code:
SELECT
		TRANSLATE(ADDR_EMAIL,'^ ','^')
FROM
		T_MEMBERSHIP_DETAILS
WHERE
	INSTR(ADDR_EMAIL,'@') > 0
	AND LENGTH(TRANSLATE(SUBSTR(ADDR_EMAIL,1,INSTR(ADDR_EMAIL,'@')-1),'A0123456789','A')) IS NOT NULL
    AND LENGTH(TRANSLATE(SUBSTR(ADDR_EMAIL,INSTR(ADDR_EMAIL,'@')+1),'A0123456789','A')) IS NOT NULL

	AND	(T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.co'
	OR	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.eu'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.tv'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.biz'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.id%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.info%'
	OR	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.asn%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.com%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.net%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.us'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.mobi'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.ws%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.org%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.edu%'
	OR	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.gov%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.csiro%'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.info'
	OR  T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	LIKE '%.conf')

	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '.%'
	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%.'
	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%[%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%!%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '@%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL   	NOT LIKE '%#%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL  	NOT LIKE '%$%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%^%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%&%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%*%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%(%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%)%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%+%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%?%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%,%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%;%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%{%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%}%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%\%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%`%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%~%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%=%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL    	NOT LIKE '%<%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL  	NOT LIKE '%>%' 
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%none%'
	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE 'a@%'
	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%aaa@%'
	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE 'azz@%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%none%'
	AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%noemail%'
	AND	T_MEMBERSHIP_DETAILS.ADDR_EMAIL 	NOT LIKE '%test%'

GROUP BY
	ADDR_EMAIL

ORDER BY
	ADDR_EMAIL

If the front end was a bit cleaner and prevented invalid characters or required email validation you would not need to do this. If you are like me, and you are just left to have to clean up the email lists, then this is the method should help.

If is far from perfect but it should cleanup a fair few of the records.

If you think you can add or improve this by all mean please do.

I guess one think that is needed is to strip all characters before the @ symbol and check to see if what remains is still numeric. I think you might find that this will be a small proportion of your list.

I need to finish by acknowledging Dave for his assistance with removing numeric values. Thanks Dave for all your help, greatly appreciated.

If this helped you let us know.

[afro]ZeroAnarchy
Experience is a wonderful thing. It enables you to recognize a mistake
when you make it again.

 
Zero,

The last few lines of your code rule out e-mail addresses that are not necessarily invalid:
Code:
...AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%none%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE 'a@%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%aaa@%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE 'azz@%'
    AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%none%'
    AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%noemail%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%test%'
Also, unless I am missing something, your code invalidates all valid e-mail address (since all e-mail addresses must have a period ('.') in front of the domain suffix.):
Code:
...AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '.%'
   AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%.'
...

Your original data did not have any e-mail suffixes, and I blythely wrote the original "all-numerics" filtering code without regard for your missing suffixes. So, your most recently posted code needs that correction.

In the code, below, I have consolidated all of the corrections and significantly simplified the filtering algorithms. (In the sample data, below, only the first e-mail and last two e-mails are valid.):
Code:
(Sample data)
select * from T_MEMBERSHIP_DETAILS;

ADDR_EMAIL
-------------------
abcdefg@hijklmn.com
123456@hijklmn.tv
123456@123456.org
abcdefg@123456.edu
xyz@xyz.xyz
xyz@xyz.csiro
xyz@xyz.net

7 rows selected.

select trim(addr_email) valid_emails
  from T_MEMBERSHIP_DETAILS
 where INSTR(ADDR_EMAIL,'@') > 0 -- ensures that e-mails have '@'
   /* next "and" ensures that string prior to '@' is not all numeric */
   and LENGTH(TRANSLATE(SUBSTR(ADDR_EMAIL,1,INSTR(ADDR_EMAIL,'@')-1),'A0123456789','A')) IS NOT NULL
   /* next "and" ensures that string following '@' 
      and prior to "." is not all numeric */ 
   and LENGTH(TRANSLATE(SUBSTR(ADDR_EMAIL,INSTR(ADDR_EMAIL,'@')+1,
       instr(addr_email,'.',-1)-instr(addr_email,'@')-1),'A0123456789','A')) IS NOT NULL
   /* next "and" ensures that no illegal characters are in e-mail address */
   and length(translate(addr_email,'A!#$^&*()+?,;{}\`~=<>[]','A')) = length(addr_email)
   /* next "and" ensures that e-mails have valid suffix */
   and ' co eu tv biz id info asn com net us mobi ws org edu gov csiro info conf ' like 
       '%'||lower(trim(substr(addr_email,instr(addr_email,'.',-1)+1)))||'%'
/

VALID_EMAILS
-------------------
abcdefg@hijklmn.com
xyz@xyz.csiro
xyz@xyz.net

3 rows selected.
Let us know if you approve these simplifications.



[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Hi Dave,

Code:
...AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '.%'
   AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%.'
...

The concept of this code is to prevent the following types of email addresses.

.john@test.com
john@test.com.


Code:
...AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%none%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE 'a@%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%aaa@%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE 'azz@%'
    AND T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%noemail%'
    AND    T_MEMBERSHIP_DETAILS.ADDR_EMAIL     NOT LIKE '%test%'
In house these tend to be the options that people use when the customer doesn't have an email address or the customer does not want to supply their email address. This allows the staff to progress onto the next screen. Along with "Withheld" and "Invalid". It's not what they are told to do, but rather what most people have been putting into the system. Over 3,000 emails meet this criteria.

Thanks Dave for all your help. I hope others find this useful in the future if so please don't forget to give Dave a nice big star. :)

If you do plan on doing a big mail out that is costing you per email, I would suggest that you pick up an email validator. Most are under $50 and they may save you thousands. In my case from 100,000 email addresses I was down to 80,000 after using the above code. After passing the email addresses through a validator it knocked off a further 26,000. So leaving me with a total of 54,000 valid email addresses.

Cheers


[afro]ZeroAnarchy
Experience is a wonderful thing. It enables you to recognize a mistake
when you make it again.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top