Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Meta Keyword Generator

Status
Not open for further replies.

phpmine

Programmer
Jun 29, 2008
5
US
Hello you PHP Community! I am looking to make an automatic meta keyword generator based on an array of phrases. It is more my site that is under construction, Go to the "Q and A" section and you'll see a series of questions there, five in total. right now the system simply grabs all the words there, filters though common words, and then spits out the words as keywords. I would like some advice on how to set it up in this way:

puts all the words in an array and counts their frequency
creates an array of two words phrases and counts their frequency
does the same for 3 word phrases
spits out the top 25 keywords from the array generated

maybe someone has already cooked up this nasty script. thanks in advance guys!
 
this should get you started. you will need to work out how to grab real text rather than html. you will also need to work out the business rules for determining what a two-word phrase is. ditto three-word phrase.

Code:
<?php
$url = '[URL unfurl="true"]http://www.phpmine.com/PHP_Questions_and_Answers/';[/URL]
$words = getWords($url);	//get words into array
$results = calculateFrequency($words, 25); //calculate frequency and return the xx highest results
echo "<pre>".print_r($results, true) . "</pre>";

function calculateFrequency($words, $top){
	$ignorewords = array(" ", "\r\n", "\n");	//include words to ignore here
	$frequency = array(); // holding array
	foreach($words as $word){
		if (!in_array($word, $ignorewords) && (!empty($word))){
			if(isset($frequency[$word])){
				$frequency[$word]++;
			} else {
				$frequency[$word] = 1;
			}
		}
	}
	//now sort them in h->l numeric order
	arsort($frequency, SORT_NUMERIC);
	return array_slice($frequency, 0, $top, true);
}

function getWords($url){
	$contents = file_get_contents($url);
	$array = explode(" ",htmlspecialchars( $contents));
	return $array;
}
?>
 
I actually have this already done. that is how i have the keywords there now. but how do i calculate the frequency of phrases?
 
getting the frequency of words i not the same as getting the frequency of phrases. yes you helped me on the frequency of words part, but like i said before...i already have the frequency of words, not phrases.
Code:
//$keywordsarray is the array that contains said keywords

$twowordphrases = array();
//generate two word phrases
for($i = 0; $i < count($keywordsarray); $i += 2)
{
	if(isset($keywordsarray[$i + 1]))
	{
		$twowordphrases[] = $keywordsarray[$i] . " " . $keywordsarray[$i + 1];
	}
}

$threewordphrases = array();
//generate three word phrases
for($i = 0; $i < count($keywordsarray); $i += 3)
{
	if(isset($keywordsarray[$i + 2]))
	{
		$threewordphrases[] = $keywordsarray[$i] . " " . $keywordsarray[$i + 1] . " " . $keywordsarray[$i + 2];
	}
}

in combination with what you wrote should do the trick
 
getting frequency of anything is done in the same way as I showed (or an equivalent). You needed to explain what business rules you were imposing on the creation of two and three word phrases. how do you decide what is a valid phrase and what is not? etc.

anyway, if you have worked it out for yourself, that is good.
 
I know, but there are 2 parts to my request. your code satisfies the part about frequency. but i also stated how can i extract all the 2 and 3 word phrases which is the code i pasted for the sake of another finding the thread.
 
i meant that you have not yet told us what business rules you have selected for determining what is a two word or a three word phrase.

for example if you have the following
i am a test sentence

are these two word phrases:

I am
am a
a test
test sentence

or are only these two word phrases
I am
a test

this is just one of the business rules which you would have needed to provide to help us help you. Without this any answer we could have given would have been based on assumed requirements.
 
Ah, I see cool. yeah teh script i posted takes in ALL the two word phrases so for i am a test sentence

it takes

I am
am a
a test
test sentence

then i would throw it in your snippet and see how many times they come up.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top