Guest_imported
New member
- Jan 1, 1970
- 0
Hi Guys
I'm new to php and could do with a hand.
I am trying to write a meta grabing script using the function below for a text based
search engine.
(User enters URL and submits meta tags are grabed by the script and the databases is updated).
The text database looks like this.
2000----html----title----description----url----keywords,delimited,like,so----(none given)
[i.e.]
2000----html----StarDeveloper.com----A web site dedicated to Active Server Pages.-Info, Articles and Tutorials on Microsoft Active Server Pages.---- dll,isapi----(none given)
The script below is not working properly and I can't understand why. I would like to use the use the
FUNCTION: metaengine(); at the bottom of the page but am not sure how to.
I would also like to some how validate the $tags[title $tags[description] $URL $tags[keywords]
variables to insure that all the variables contain extracted meta tag data.
If they do the database should be updated. Else if the meta data is incomplete the script
should redirect to a html page say (manual_add.html).
Best Regards
Greg@sublite.co.uk
<?php
$data_file = "data.txt";
if ($s == "submit" {
$tags = get_meta_tags($URL);
$line = "2000----html----$tags[title]----$tags[description]----$URL----$tags[keywords]----(none given)\n";
$fd = fopen($data_file, "a" or die("Could not open data file!";
fputs($fd, $line);
fclose($fd);
echo "Done!";
}
else {
echo "<form method=\"post\"><input type=\"hidden\" name=\"s\" value=\"submit\">URL: <input type=\"text\" name=\"URL\" size=\"35\"><br><input type=\"submit\" value=\"submit\"></form>";
}
?>
<?php // metaengine.php3
/***********************************************************************************
FILE: tags.php3
FUNCTION: metaengine();
INPUT: Any URL value, ($url);
OUTPUT: An array of meta tag information, where the name
attribute of the meta tag is the key value of the array.
(e.g. $meta[title], $meta[keywords], $meta[description])
DESCRIPTION: This function was created because the get_meta_tags()
function does not properly return meta information if the
requested file is not formated correctly or if newline,
return, and tab characters are present within the meta tag.
NOTES: This function will only return meta information for title,
keywords, and description.
************************************************************************************/
function metaengine($url)
{
// Pattern for meta title
$p_title[0] = '(<title>)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_title[1] = '(<meta)([[:space:]]+)(name="title"([[:space:]]+)(content="([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_title[2] = '(<meta)([[:space:]]+)(name=title)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
// Pattern for meta description
$p_description[0] = '(<meta)([[:space:]]+)(name="description"([[:space:]]+)(content="([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_description[1] = '(<meta)([[:space:]]+)(name=description)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
// Pattern for meta keywords
$p_keywords[0] = '(<meta)([[:space:]]+)(name="keywords"([[:space:]]+)(content="([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_keywords[1] = '(<meta)([[:space:]]+)(name=keywords)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_keywords[2] = '(</head>)(.+)';
// Fetch file into an array
if(!($file = @file( $url, "r" )))
{
$keywords = 'Not Available';
$description = 'Not Available';
$title = 'Not Available';
}
else
{
// Turn array into a string using a space as the delimiter.
$target = @implode( " ", $file);
// Remove tab, return, and newline characters.
$pat = "\n";
$repl = " ";
$target = ereg_replace($pat, $repl, $target);
$pat = "\t";
$repl = " ";
$target = ereg_replace($pat, $repl, $target);
$pat = "\r";
$repl = " ";
$target = ereg_replace($pat, $repl, $target);
// Evaluate string with regular expression and find match for title.
if(eregi($p_title[0], $target, $match))
{
$title = $match[2];
}
elseif(eregi($p_title[1], $target, $match))
{
$title = $match[6];
}
elseif(eregi($p_title[2], $target, $match))
{
$title = $match[6];
}
else
{
$title = 'Not Available';
}
// Evaluate string with regular expression and find match for description.
if(eregi($p_description[0], $target, $match))
{
$description = $match[6];
}
elseif(eregi($p_description[1], $target, $match))
{
$description = $match[6];
}
else
{
$description = 'Not Available';
}
// Evaluate string with regular expression and find match for keywords.
if(eregi($p_keywords[0], $target, $match))
{
$keywords = $match[6];
}
elseif(eregi($p_keywords[1], $target, $match))
{
$keywords = $match[6];
}
// If no meta tag content is presend for keywords use document text as keywords
// starting after the </head> tag.
elseif(eregi($p_keywords[2], $target, $match))
{
//Remove HTML and PHP tags
$match[2] = strip_tags($match[2]);
//Strip white spaces before and after string
$match[2] = trim($match[2]);
//Limit size of string to 1000 characters starting at the 100th character
$match[2] = substr($match[2], 100, 1100);
$keywords = $match[2];
}
else
{
$keywords = 'Not Available';
}
}
$metatag[title] = $title;
$metatag[description] = $description;
$metatag[keywords] = $keywords;
return $metatag;
}
?>
I'm new to php and could do with a hand.
I am trying to write a meta grabing script using the function below for a text based
search engine.
(User enters URL and submits meta tags are grabed by the script and the databases is updated).
The text database looks like this.
2000----html----title----description----url----keywords,delimited,like,so----(none given)
[i.e.]
2000----html----StarDeveloper.com----A web site dedicated to Active Server Pages.-Info, Articles and Tutorials on Microsoft Active Server Pages.---- dll,isapi----(none given)
The script below is not working properly and I can't understand why. I would like to use the use the
FUNCTION: metaengine(); at the bottom of the page but am not sure how to.
I would also like to some how validate the $tags[title $tags[description] $URL $tags[keywords]
variables to insure that all the variables contain extracted meta tag data.
If they do the database should be updated. Else if the meta data is incomplete the script
should redirect to a html page say (manual_add.html).
Best Regards
Greg@sublite.co.uk
<?php
$data_file = "data.txt";
if ($s == "submit" {
$tags = get_meta_tags($URL);
$line = "2000----html----$tags[title]----$tags[description]----$URL----$tags[keywords]----(none given)\n";
$fd = fopen($data_file, "a" or die("Could not open data file!";
fputs($fd, $line);
fclose($fd);
echo "Done!";
}
else {
echo "<form method=\"post\"><input type=\"hidden\" name=\"s\" value=\"submit\">URL: <input type=\"text\" name=\"URL\" size=\"35\"><br><input type=\"submit\" value=\"submit\"></form>";
}
?>
<?php // metaengine.php3
/***********************************************************************************
FILE: tags.php3
FUNCTION: metaengine();
INPUT: Any URL value, ($url);
OUTPUT: An array of meta tag information, where the name
attribute of the meta tag is the key value of the array.
(e.g. $meta[title], $meta[keywords], $meta[description])
DESCRIPTION: This function was created because the get_meta_tags()
function does not properly return meta information if the
requested file is not formated correctly or if newline,
return, and tab characters are present within the meta tag.
NOTES: This function will only return meta information for title,
keywords, and description.
************************************************************************************/
function metaengine($url)
{
// Pattern for meta title
$p_title[0] = '(<title>)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_title[1] = '(<meta)([[:space:]]+)(name="title"([[:space:]]+)(content="([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_title[2] = '(<meta)([[:space:]]+)(name=title)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
// Pattern for meta description
$p_description[0] = '(<meta)([[:space:]]+)(name="description"([[:space:]]+)(content="([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_description[1] = '(<meta)([[:space:]]+)(name=description)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
// Pattern for meta keywords
$p_keywords[0] = '(<meta)([[:space:]]+)(name="keywords"([[:space:]]+)(content="([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_keywords[1] = '(<meta)([[:space:]]+)(name=keywords)([[:space:]]+)(content=)([a-zA-Z_0-9@!%-;&`,\'\+\$\.\n\t\r ]+)';
$p_keywords[2] = '(</head>)(.+)';
// Fetch file into an array
if(!($file = @file( $url, "r" )))
{
$keywords = 'Not Available';
$description = 'Not Available';
$title = 'Not Available';
}
else
{
// Turn array into a string using a space as the delimiter.
$target = @implode( " ", $file);
// Remove tab, return, and newline characters.
$pat = "\n";
$repl = " ";
$target = ereg_replace($pat, $repl, $target);
$pat = "\t";
$repl = " ";
$target = ereg_replace($pat, $repl, $target);
$pat = "\r";
$repl = " ";
$target = ereg_replace($pat, $repl, $target);
// Evaluate string with regular expression and find match for title.
if(eregi($p_title[0], $target, $match))
{
$title = $match[2];
}
elseif(eregi($p_title[1], $target, $match))
{
$title = $match[6];
}
elseif(eregi($p_title[2], $target, $match))
{
$title = $match[6];
}
else
{
$title = 'Not Available';
}
// Evaluate string with regular expression and find match for description.
if(eregi($p_description[0], $target, $match))
{
$description = $match[6];
}
elseif(eregi($p_description[1], $target, $match))
{
$description = $match[6];
}
else
{
$description = 'Not Available';
}
// Evaluate string with regular expression and find match for keywords.
if(eregi($p_keywords[0], $target, $match))
{
$keywords = $match[6];
}
elseif(eregi($p_keywords[1], $target, $match))
{
$keywords = $match[6];
}
// If no meta tag content is presend for keywords use document text as keywords
// starting after the </head> tag.
elseif(eregi($p_keywords[2], $target, $match))
{
//Remove HTML and PHP tags
$match[2] = strip_tags($match[2]);
//Strip white spaces before and after string
$match[2] = trim($match[2]);
//Limit size of string to 1000 characters starting at the 100th character
$match[2] = substr($match[2], 100, 1100);
$keywords = $match[2];
}
else
{
$keywords = 'Not Available';
}
}
$metatag[title] = $title;
$metatag[description] = $description;
$metatag[keywords] = $keywords;
return $metatag;
}
?>