Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

stripping html file of only links 2

Status
Not open for further replies.

s0crates9

Technical User
Jun 18, 2005
70
US
I want to essentially strip out everything from an html document except the links. I want to place these links into an array as well.

I have this so far, but it is not placing each link on a seperate line, rather it seems to be in one long line:
Code:
$lines = file("index.html");
$html = join("",$lines);

// remove all line breaks
$html = str_replace("\n","",$html);
// and put in a new line break behind every anchor tag
$html = str_replace("</a>","</a>\n",$html);
// split the string into single lines
$lines = split("\n",$html);

Essentially, I figure the page would be loaded into a variable (array), split up of elements - finding all links and then stripping everything in between links. Finally, those links would be separated into different lines and used for output.

Thanks for everyone's help with this!

Business Identity and Web Development Services
 
by doing:
$html = str_replace("\n","",$html);

you can not do:
$lines = split("\n",$html);

because you removed them all and made $lines one long string.

It would be easier to just pull out all the links and not pull out the html:

Code:
$test=file_get_contents("[URL unfurl="true"]http://www.msn.com");[/URL]
preg_match_all("/\<a .*?\>.*?\<\/a\>/i", $test, $links);
print_r($links);
 
btaber,

That does indeed work, but it spits out images and active links. Is there anyway to get the links to come out like:
Code:
<a href="blah.html"><img src="image.gif"></a>

I am trying different ways to get them to be text links only, but not having any luck...


I appreciate your help!


Business Identity and Web Development Services
 
try:
preg_match_all("/<a[^>]+>.*?<\/a>/i", $test, $links);


Known is handfull, Unknown is worldfull
 
Thanks Everyone! The tag matching regular expressions work well. I finally found out that there is a php function that takes the variables from preg_match_all/preg_match and convert them to strictly HTML TEXT not active HTML.

Thanks for the help! You guys deserve a star.

Business Identity and Web Development Services
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top