I'm trying to find / create a search engine that will index all the pages of a website. my issue is most of the webpages are heavily dependant on mysql for the page content. any ideas on how i can achieve this?
The same was as if they were static... when you "read" the pages (i assume you'll be doing this with an fopen?) you should get the dynamic content in your output.
"r"
i get :: Warning: php_hostconnect: connect failed in /web/index.php on line 2. but i can use it like this ::$fd=fopen("/index.php", "r". problem is when i do it like that php doesn't process the file. i also would really like to set this up as a shell script and schedule it in cron.
Have you considered trying to wget the files, (which of course will yield all the content as displayed in the browser), store that in a table called staticcontent with two columns, pagename and content. Then, whenever the user does a search, search staticcontent.content for the parameters as well. Depending on the depth of your site, you could even fulltextsearch it, which should help the engine a whole lot.
If you're worried about the HTML tags in the table, you could regex out all the HTML tags then insert it into the table.
I've never tried this, but it seems like it would work well. The only drawback would be the network traffic it generates. Assuming that you run the cron job late at night, once a day or something, it shouldn't be too bad.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.