Hi all,
I've designed a simple spider for a search engine, which works like this:
So it get's the description, keywords and title for me, but I also want to index the content of the page; I also want the content between the body-tags. But this won't work:
When I try to write it to the database, it gives zero result.
Can anyone help me with this please? Thank you!
I've designed a simple spider for a search engine, which works like this:
Code:
$url = $q->param("url");
$sp_url = $url;
$content = get($url);
$modifylink = 'new';
if ($content) {
#Get the title
$content =~ /<title>(.*)<\/title>/ig;
$sp_title = $1;
$sp_title =~ s/\"//g; #remove double quotes
$sp_title =~ s/\'//g; #remove single quotes
#Get the description
$content =~ /<META name=\"description\" content=\"(.*?)\">/i;
$sp_desc = $1;
$sp_desc =~ s/\"//g; #remove double quotes
$sp_desc =~ s/\'//g; #remove single quotes
#Get the keywords
$content =~ /<META name=\"keywords\" content=\"(.*?)\">/i;
$sp_keys = $1;
$sp_keys =~ s/\"//g; #remove double quotes
$sp_keys =~ s/\'//g; #remove single quotes
Code:
#Get the title
$content =~ /<body>(.*)<\/body>/ig;
$sp_body = $1;
$sp_body =~ s/\"//g; #remove double quotes
$sp_body =~ s/\'//g; #remove single quotes
Can anyone help me with this please? Thank you!