Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex - delete all after .com, .edu., .org, .ent

Status
Not open for further replies.

webdev007

Programmer
Sep 9, 2005
168
I am not much of regex educated person (understatement)
I am trying to remove everything after .com. .net. .org. .edu.
or
after .com/ .net/ .org/ .edu/
In cases where each of them could end by a "dot" or by a "slash"

here is my trial:
$url="
$search_for = array("$del = array("");
$new_url2 = str_replace($search_for, $del, $url);

//$new_url2=rtrim($new_url, '*');
//$new_url=$new_url2;
echo"first echo: $new_url2<br>";

// the above works fine ( del the http:// part) then it does not work!...

$regex ="( (.com\/|.net\/|.org\/|.edu\/|.com.|.net.|.org.|.edu.)?([^\/:]+)/ )";
if
(preg_replace($regex, "", $new_url2))
{
print $new_url2 . "<br>\n";
}
 
This code:

Code:
<?php

$url = array
(
	'[URL unfurl="true"]http://www.stuffs.com.aaa/aaaa.php',[/URL]
	'[URL unfurl="true"]http://www.stuffs.net.aaa/aaaa.php',[/URL]
	'[URL unfurl="true"]http://www.stuffs.org.aaa/aaaa.php',[/URL]
	'[URL unfurl="true"]http://www.stuffs.edu.aaa/aaaa.php',[/URL]
	'[URL unfurl="true"]http://www.stuffs.co.uk/aaaa.php'[/URL]
);

$url = str_replace ('[URL unfurl="true"]http://www.',[/URL] '', $url);
$url = preg_replace ('/(\.com|\.net|\.org|\.edu).*/','$1',$url);

print '<pre>';
print_r ($url);
print '</pre>';
?>

outputs:

[tt]Array
(
[0] => stuffs.com
[1] => stuffs.net
[2] => stuffs.org
[3] => stuffs.edu
[4] => stuffs.co.uk/aaaa.php
)[/tt]

but notice that it only works for the original TLDS, not for anything else.



Want the best answers? Ask the best questions! TANSTAAFL!
 
You would need something akin to a regular expression such as:
Code:
/(\.(com|net|org|edu)[\.\/]).*$/
The replacement string would need to replace the first match expression, since this is part of the string that you wish to keep. Also, preg_replace outputs the replaced string, but does not modify the original. Thus:
Code:
print preg_replace("/(\.(com|net|org|edu)[\.\/]).*$/", "$1", $new_url2) . "<br />\n";
is more likely to do what you seem to want.
 
Thanks both
Morax, how could I remove the resulting left dot or slash after the DN
example:
$url="your regex works fine
But does not remove the righ end left " . "
Plus I need to also remove a possible " /" after the DN
so it needs to test for both and delete the following dot or slash and whatever is left after the removed . or /

as is it results in
 
That appeared to be what you wanted, so that's what I set it to do. The following change to the regexp:
Code:
/(\.(com|net|org|edu))[\.\/].*$/
will bypass that by leaving the suffix character out of the part of the string that's being kept. Incidentally:
Code:
/^http:\/\/www\.(\w+\.(com|net|org|edu))[\.\/].*$/
will do the http:// stripping as well in a single substitution.
 
MOrac,
thanks I feel bad about it
but it results in wrong parameters

here is what I have as per your regexp
print preg_replace("/^http:\/\/www\.(\w+\.(com|net|org|edu))[\.\/].*$/", $url ) ;
 
Thanks again, I fixed it
New code:
preg_replace("/((com|net|org|edu)).*$/", "$1", $new_url2) ."<br />\n";

by removing [\.\/]

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top