Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help with a regular expression 1

Status
Not open for further replies.

Foamcow

Programmer
Nov 14, 2002
6,092
GB
I'm using preg_replace() to try and remove the following from a larger string (a file that I'm reading using file_get_contents() )
I can't work out a regular expression to pick out the bit I want to get rid of. I'm sure it's easy, I just have no clue with regexps!

This is what I want to match/remove:

Code:
<style type="text/css">
.sitedistribute {font-family:; font-size:px; color:#; font-weight:;}
A.sitedistribute {font-family:; font-size:px; color:#; font-weight:;}
.copydistribute {font-family:; font-size:px; color:#; font-weight:;}
A.copydistribute {font-family:; font-size:px; color:#; font-weight:;}
</style>

I've been trying to match on the instances of "<style" and "style>" and what's between them.

I've tried googling and been messing with this for a while, I've got a headache now so can you help!?

Foamcow Heavy Industries - Web design and ranting
Toccoa Games - Day of Defeat gaming community
Target Marketing Communications - Advertising, Direct Marketing and Public Relations
"I'm making time
 
try this. the output will be in the second element of the $matches variable.


Code:
$filename = "path/to/file.php";
$text = file_get_contents ($filename);
$regex = "'<style[^>]*>(.*?)</style>'";
preg_match_all($regex,$text,$matches);
echo "<pre>";
print_r $matches;
echo "</pre>";
 
No, that didn't do what I wanted.

Hmm maybe I'm going about this the wrong way.

I will explain further:
I am redeveloping a small site for a client that uses a news feed supplied by a National Newspaper in the UK. The newspaper gave us permission to use the feed.
When they sent me the info to get the feed, it turned out to be a bit of javascript that used document.write to output the headlines.

The problem is that they are REALLY badly formatted and don't fit in with my neat web standards, xhtml and css approach.

I'm waiting for contact info for their IT department to see if I can get the feed in another format, but I would like to get this done asap.

So I figured if I could get the resulting code from the javascript and strip out the nasty bits I could reformat it a little.

This is what I came up with to get the file...

Code:
$filename = "<PATH TO FILE - CAN'T TELL YOU THAT THOUGH>";

$contents = file_get_contents($filename);

echo $contents;

This does the job and get's the following:

Code:
document.write('<style type="text/css">');
document.write('.sitedistribute {font-family:; font-size:px; color:#; font-weight:;}');
document.write('A.sitedistribute {font-family:; font-size:px; color:#; font-weight:;}');
document.write('.copydistribute {font-family:; font-size:px; color:#; font-weight:;}');
document.write('A.copydistribute {font-family:; font-size:px; color:#; font-weight:;}');
document.write('</style>');


document.write('<span class="sitedistribute">Headlines from <a class="sitedistribute" href="[URL unfurl="true"]http://www.mirror.co.uk"[/URL] target="icnetwork">Mirror</a>:</span><br>');

document.write('<br>');

document.write('<a class="copydistribute" href="[URL unfurl="true"]http://www.mirror.co.uk/news/frontpagebottom/page.cfm?objectid=15298062&method=full&siteid=50143">SAVE[/URL] OUR SEMIS</a><br>');

document.write('<a class="copydistribute" href="[URL unfurl="true"]http://www.mirror.co.uk/news/frontpagebottom/page.cfm?objectid=15298044&method=full&siteid=50143">JADE[/URL] GOODY QUIZZED BY BARBADOS COPS</a><br>');

document.write('<a class="copydistribute" href="[URL unfurl="true"]http://www.mirror.co.uk/news/frontpagebottom/page.cfm?objectid=15298078&method=full&siteid=50143">SURANNE[/URL] TRAILER TRASHED</a><br>');

document.write('<a class="copydistribute" href="[URL unfurl="true"]http://www.mirror.co.uk/news/frontpagebottom/page.cfm?objectid=15298060&method=full&siteid=50143">HOW[/URL] DID BECKS CHOOSE CRUZ?</a><br>');

document.write('<a class="copydistribute" href="[URL unfurl="true"]http://www.mirror.co.uk/news/frontpagebottom/page.cfm?objectid=15298043&method=full&siteid=50143">BUDGET[/URL] TAX BOOST FOR OAPS</a><br>');

I want to end up with just the links in an unordered list.
I managed to get rid of the "document.write", "<br>" etc. but I am stuck with the style declarations at the top.
So what I want to do is get rid of that style sheet at the top and the contents of that <span> tag.

Foamcow Heavy Industries - Web design and ranting
Toccoa Games - Day of Defeat gaming community
Target Marketing Communications - Advertising, Direct Marketing and Public Relations
"I'm making time
 
it should be possible - just extract what's between the anchor tags and abandon the rest of the document.

i have taken your screen dump and put it into a text file on my own system. have a look at
 
did the code i posted (on the page that i linked to) not work for you? i must have misunderstood- i thought all you wanted were the links without the styles.
 
Yeah i do want the links without all the style stuff, but it's easier for me to just remove the stuff I don't want and replace the <a> and </a> with <li><a> and </a></li>

Thanks anyway though.
Most of the regexp's I found did not work as expected. I think it's probably down to how I was using them though. :)

Foamcow Heavy Industries - Web design and ranting
Toccoa Games - Day of Defeat gaming community
Target Marketing Communications - Advertising, Direct Marketing and Public Relations
"I'm making time
 
honestly it would simpler to extract what you do want and ignore the rest.

this code will remove the links and add li tags. it's pretty simple too:

Code:
<?
$contents= file_get_contents("c:/text.txt");
$regex = "'<a[^>]*>(.*?)</a>'";
preg_match_all($regex,$contents,$matches);
foreach ($matches[0] as $key=>$val)
{
	$result[$key] = str_replace("<a","<li><a",$val);
	$result[$key] = str_replace("</a>", "</a></li>",$result[$key]);
}

foreach($result as $key=>$val)
{
	echo $val;
}

?>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top