Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegExp

Status
Not open for further replies.

vbkris

Programmer
Jan 20, 2003
5,994
IN
hi this is my first try at RegExp
i want to validate an email:
The rules are:
1. There must be an @ sign.
2. There must be an '.' sign.
3. '@' and '.' cannot be together.
4. '..' is not allowed.

$email="Asd@as1dcomin..s";
if(eregi("[a-z0-9]+[^.]+[^.]@[^.][a-z0-9.]+[^..]+",$email))
echo "Good Email";
else
echo "Stupid Email";

this checks gor everything but '..' so whats wrong. can u give me a good tutorial (which can explain in a human language :) )...


Known is handfull, Unknown is worldfull
 
vbkris,

Just a few notes and hints:
[a-z0-9] means only lowercase and digits. The following [^.] is therefore not necessary, since it is already excluded by the preceding character class definition.
What about people that type in their address in mixed case?

\w is equal to [a-zA-Z0-9_], all alphanuermic characters plus the underscore i.e., all characters considered "word" characters in Perl.

\w+ means: one or more "word" characters.

Let's see if that helps! ;)
 
the regex would be
"\w+\@(\w+\.)+\w+
it is much easier and performance should be better
 
the regex would be
"\w+\@(\w+\.)+\w+"
it is much easier and performance should be better
 
vbkris
Should be noted that the . is a special character in regular expression which matches any character (except newlines if I'm not mistaken), so if you want to refer to the ., you should use \.

Additionally I would use the preg_* family of functions, they use the perl style regular expressions which are easier to find support for on the web... and if I'm not mistaken the performance on them is better.

DRJ
eregi is a case insensitive search anyway

-Rob

 
skiflyer

Thanks for the note about eregi.
I thought about preg_ because I use that out of personal preference.
 
You know, one day I'll put everything I want into one post.

Anyway, the ereg functions are POSIX regex's which are a kind of universal standard... the Perl supported regex's are extended in alot of neat ways, included the ability to do non-greedy matching which eventually you'll care about for some regular expression or another. Oh, and you can do backtracking, so if you wanted to capture someones email and easily refer back to the part before the @ symbol, it would be trivial.

Second... as always, the PHP manual is a great resource


And another page I use once'n awhile when I'm stuck on basic syntax

Lastly,
If you've learned about them before, One thing to keep in mind when designing complex regular expressions is that regex's are really just FSM's, so if you can break your problem down into a problem solvable by an FSM instead you're golden... I find that sometimes that approach is easier. For example... in your above stated problem you know you have three big states with different exits (each of which can be broken down into two or three states themselves)

before the @ symbol, the @ symbol, and after the @ symbol

And then the last thing I'll say after a quick glance at your first approach. Usually something like [^..] will not work for your problem. That particular approach won't work for alot of reasons (puting the same character in a list twice is meaningless), but saying NOT something when it's surrounded by allowing that same something, will likely just advance your FSM one state, at which point it won't match.

I don't think that's clear so let me illustrate... You've told your machine you want any characters included a period one or more times, followed by no double periods one or more times, followed by any characters including a period one or more times

So now, it sees
ab..cd
It's going to match the d as the last chunk (any character one or more times)
the c as the second chunk (no double periods)
and ab.. as the first chunk (any character one or more times)

Anyway, I hope that's helpful for you, regex's are a ton of fun and amazingly useful

-Rob
 
There is no rule that says you have to do it in a single regular expression.

Split the address at the "@", and make sure you get exactly two pieces.

Check the user part for invalid characters.

Check the domain part for the presence of at least one ".". It is also possible to match a domain part to a list of known top-level domains.

I also have a FAQ on this subject with code: faq434-2408

Want the best answers? Ask the best questions: TANSTAAFL!
 
O.K fellows i am checking on it and will get back...

Known is handfull, Unknown is worldfull
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top