Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

preg_replace autolink input 1

Status
Not open for further replies.

jasc2k

Programmer
Nov 2, 2005
113
GB
hi all,

my shortened function below simply takes a user input and links any URLs or email addresses appropriatly in HTML so when posted links work as seen on facebook etc

Code:
/* Convert all URL matches to appropriate HTML links */
			$message = preg_replace('#([\s|^])([URL unfurl="true"]www)#i',[/URL] '$1[URL unfurl="true"]http://$2',[/URL] $message);
			$pattern = '#((http|https|ftp|telnet|news|gopher|file|wais):\/\/[^\s]+)#i';
			$replacement = '<a href="$1" target="_blank">$1</a>';
			$message = preg_replace($pattern, $replacement, $message);
		
			/* Convert all E-mail matches to appropriate HTML links */
			$pattern = '#([0-9a-z]([-_.]?[0-9a-z])*@[0-9a-z]([-.]?[0-9a-z])*\\.';
			$pattern .= '[a-wyz][a-z](fo|g|l|m|mes|o|op|pa|ro|seum|t|u|v|z)?)#i';
			$replacement = '<a href="mailto:\\1">\\1</a>';
			$message = preg_replace($pattern, $replacement, $message);

no matter what I do I always get the same basic problem that when two or more URLs are entered in a row (without spaces or joined with another character) all of the links get joined into one

i.e user submits:
that whole string would be one link instead of three?

I know someone must have the answer - interestingly on posting realised the same happens here :eek:(
thanks in advance
 
no need to blog it: this site is one of the most visited tech help sites on the net!

I'm in France these days. Have been here for a decade or so. Have not experienced UK broadband for some time. I get a rock solid 9Mb/1Mb connection. includes free calls to all major country landlines and TV for 29 euros. which isn't bad imo.

have fixed the email issue. it now creates a valid email link too

Code:
<?php
$text =<<<TEXT
string with a url in it [URL unfurl="true"]http://www.boogy.com/withapath/file.php?aquery=something#fragment[/URL]
more text with a half formed url [URL unfurl="true"]www.microsoft.com[/URL]
set of text with an ellided set of urls [URL unfurl="true"]http://www.domain.comhttp://www.domains.co.ukwww.halfdomain.edu[/URL]
text with an hyperlink <a href='[URL unfurl="true"]http://www.iamalink.com'>Link</a>[/URL] 
image <img src='[URL unfurl="true"]http://www.tek-tips.com/images/header-logo.gif'>[/URL]
and a youtube video: [URL unfurl="true"]http://www.youtube.com/watch?v=fllDB3FK7pI[/URL]
email.address@domain.com
mailto:email.address@domain.com
TEXT;

echo (linkify($text));
function linkify($text){
	$protocols = array('http','https','ftp','file','gopher','mailto'); //add to if you need
	
	//domains
	$gTLDs = array('.info','.com','.edu','.org','.net','.mil'); //you should be able to add quite a few others so long as they do not overlap
	$newTLDs = array('.aero', '.biz', '.coop', '.info', '.museum', '.name', '.pro');
	$ukcc = array('.co.uk','.gov.uk','.ac.uk', 
					'.ltd.uk','.me.uk','.mod.uk','.net.uk',
					'.nhs.uk','.nic.uk', '.org.uk','.parliament.uk',
					'.plc.uk','.police.uk','.sch.uk', '.bl.uk','.icnet.uk',
					'.jet.uk','.nls.uk');
	$others = array('.tv','.eu');
	
	$domains = array_merge($gTLDs, $newTLDs, $ukcc, $others);
	//
	
	$_protocols = array_map('preg_quote', $protocols);
	$_domains = array_map('preg_quote', $domains);

	//first split the non word breaks
	$pattern = '/((?<!\'|\"|=| )(' . implode('|', $_protocols) . ')[^( |\.)])/imsu';
	$replace = ' \\1';
	$text = preg_replace($pattern, $replace, $text);
	$pattern = '/(?<! |\/|"\'|=)([URL unfurl="true"]www\.)/ims';[/URL]
	$text = preg_replace($pattern, $replace, $text);
	
	
	//now translate youtube links
	$pattern = '/\s(http\:\/\/[URL unfurl="true"]www\.youtube\.com\/watch\?v\=(\w{11}))/imse';[/URL]
	$text = preg_replace($pattern, "_youTubeEmbed('\\2')", $text);
	
	//by now we should have clean links
	//recognise links
	$pattern = '/([^(\s|\n)]*(' . implode ('|', $_domains) . ')((\?|\/|&|#)[^(\s|\n)]*)?)/imsue';
	$text = preg_replace($pattern, "_linkify('\\1')", $text);
	return $text;
}

function _youTubeEmbed($code){
	return <<<HTML
<object width="425" height="350" data="[URL unfurl="true"]http://www.youtube.com/v/{$code}"[/URL] type="application/x-shockwave-flash"><param name="src" value="[URL unfurl="true"]http://www.youtube.com/v/{$code}"[/URL] /></object>
HTML;
}

function isEmail($text){
	$pattern = ('/^[a-z0-9\!\#\$\%\&\'\*\+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$/imsu');
	return preg_match($pattern, $text);
}

function _linkify($text){
	$text = str_replace('\"', '"', $text);
	if (!preg_match('/^(src|href|data|value)/ims', $text)):
		$protocols = array('http','https','ftp','file','gopher','mailto');
		$_protocols = '(' . implode('|',$protocols) . ')';
		if (!preg_match('/^('.$_protocols . ')/ims', $text)):
			
			if (!isEmail($text)):
				return <<<HTTP
<a href="[URL unfurl="true"]http://{$text}"[/URL] target="_blank">$text</a>
HTTP;
			else:
				return <<<HTTP
<a href="mailto:{$text}">$text</a>
HTTP;
			endif;
		else:
			$_text = str_replace('mailto:', '', $text);
			return <<<HTTP
<a href="{$text}" target="_blank">$_text</a>
HTTP;
		endif;
	else:
		return $text;
	endif;
}

?>
 
lol have just finished setting live the first version and is working well. will add the extra changes tomorrow - long day lol

you cant stop coding :) your probably right this is a populay forum

france nice! realisticly on avergage up and down speed is half yours lol.

cheers
 
surreally here I sit having just got back from work, watching three men in a boat in the episode in which they are all based in Fowey!
 
lol how random!

great work - have now implemented the email check and have found that
collided domain and email string both get included in a mailto link :)

also had to alter some code in linkify function:
Code:
function _linkify($text){
		$text = str_replace('\"', '"', $text);
		if(!preg_match('/^(src|href|data|value)/ims', $text)):
			$protocols = array('http','https','ftp','file','gopher','mailto');
			$_protocols = implode('|',$protocols);
			//$_protocols = '('.implode('|',$protocols).')';
			if(!preg_match('/^('.$_protocols.')/ims', $text)):
				/* Check if the URL is an email */
				if (!$this->isEmail($text)):
					return '<a rel=\"nofollow\" href="[URL unfurl="true"]http://'.$text.'"[/URL] target="_blank">'.$text.'</a>';
				else:
					$_text = str_replace('mailto:', '', $text);
					return '<a rel=\"nofollow\" href="mailto:'.$_text.'">'.$_text.'</a>';
				endif;
			else:
				//$_text = str_replace('mailto:', '', $text);
				return '<a rel=\"nofollow\" href="'.$text.'" target="_blank">'.$text.'</a>';
			endif;
		else:
			return $text;
		endif;
}

thanks
 
good stuff.

there is probably a way to fix the collision but how often will it really happen?
 
thats very true - I have been working on another bug tonight and I would tend to agree how often will it happen I bet theres loads of bugs/exploits in my code lol

I am very happy with the code - again thank you! :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top