Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Searching SQL database and ignoring the HTML 3

Status
Not open for further replies.

timtom

Programmer
Jul 12, 2001
78
GB
Hi,
I'm doing a fully dynamic website with all my pages inside an SQL database and the time has come to think about a search facility.

I'm familiar with the LIKE sql method which I imagine I should be using, but the data in the page content field contains a lot of html formatting (this is as a result of letting the web content people using ie's document.design "thing" to input and edit their pages).

I think the HTML is pretty vital and I really have to keep it.

So, have I messed up? Or is there a way of ignoring html tags while searching the data?

Any help you can give will be much appreciated.
Cheers
Sarah
User_timtom.jpg
WASN'T FIT ENOUGH FOR THE POLICE FORCE
 
My suggestion would be,

Maintain a master list of all possible HTML tags(its going to be a lengthy but simple task indeed.) and use a query similar to this

Code:
mSQL = "SELECT Description 
FROM tbl_SoAndSoTable
WHERE Description LIKE '%" &  REQUEST.QUERYSTRING('criteria')  & " %' AND Description NOT IN (SELECT HTMLTags FROM tbl_HTML_TAGMASTER)"
Thank you,
RR.
__________________________________
The best is yet to come.
 
Oops I think the query isn't perfect. Thank you,
RR.
__________________________________
The best is yet to come.
 
as a speaking language, HTML is very uncommon. When people search the fields, I don't envision them hitting any words like IMG or HREF (titles and alt tags not withstanding).

There are a couple ways you can do this:
1) have a mirror table, where in the table you have no formatting, but it's fully searchable. (I don't like this method too much... seems like you're duplicating space for no reason)
2) deadline's suggestion, but his query isn't perfect, since any record that has HTML will automatically be removed from the RS. Right idea though... daunting task, but cool idea nonetheless :)
3) (my recommendation) create a keyword column in the table, and generate keywords for each article. Hopefully you don't have a lot of pages to go through, since this can be an extremely daunting task. It seems to be the best method though, IMHO. Just index said column, and you're on your way.

I can't think of anything else...

hth leo

------------
Leo Mendoza
lmendoza@garbersoft.net
 
Thanks for your help deadline and vasah - you've given me a lot of ideas and made me think about the actual problem in different ways.

I think I'm firstly going to try nothing - just search on the data and if the word turns up then display the page title and link from the record. You're right vashah, I don't envisage anyone searching for IMG or HREF - and if they're weird enough to do that then they'll just get loads of results but still not see any HTML. Plus, if I did a response.write on part of the data - the html would get processed anyway so there may not even be a problem.

If that method isn't succesful then I'll probably add an extra field in the records to keep the unformatted data like you said Vasah. That would likely be easier to set up than the master list of HTML - I don't think I have the stomach to do that! But it was a helpful suggestion anyway because it made me realise I would rather tell the search page WHAT to look for rather than what NOT to look for.

Thanks again
Sarah
User_timtom.jpg
WASN'T FIT ENOUGH FOR THE POLICE FORCE
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top