Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Analysing database content to extract keyword phrases.

Status
Not open for further replies.

OC4me

Technical User
Dec 20, 2009
3
TH
I've been Googling this for a while but can only come up with tools that extract keyword phrases from websites. A friend directed me to this website, so I thought I'd post about my situation here.

I've got a Microsoft Access database table with thousands of memo field records that contain written text that I publish to my website.

I want to mine all that text and extract keywords and keyword phrases into an Excel file (for subsequent use). I have noticed quite a few online tools that mine websites and generate a list of keyword phrases found on websites, but I want to do the same thing to my database here on my PC. Technically, it would be possible to export all the text fields to a temporary web page online and then have the process done remotely, but this would be impractical and this operation is something I want to automate and do on my own PC on a regular (daily) basis.

What do I mean by keyword phrases. Keyword phrases are unique words and phrases that appear 2 or more times (minus common stop words such as 'the', 'and', 'or', etc.). Most keyword phrases would be one to four words in length. Longer phrases would be quite rare.

Ideally, I hope to find a 3rd-party off-the-shelf solution to take care of this task or some useful VB code (or logic that I can turn into code) that I can modify and run from within MS Access.

It is a long-shot, I don't expect anyone to have particular expertise in this area, but one never knows, thanks!
 
The only way I can think of to do this is to create a function that splits all of the words (except your stop words and any punctuation) into an array, then build phrases from the array and compare it.
It will probable be a lot of looping and won't be quick, but it's possible.

hth

Ben

----------------------------------------------
Ben O'Hara
David W. Fenton said:
We could be confused in exactly the same way, but confusion might be like Nulls, and not comparable.
 
Thanks, but I think that arrays and all that looping might be a bit too much over my capabilities at the moment.

It occured to me that perhaps I could markup the keyword phrases in my memo fields by enclosing them in brackets (or some other tag method). That leads me to wonder if it is possible to programm Access to enclose a highlighted text string within a longer string with beginning and ending brackets? That way I could tag all keywords and keyword phrases in the database and then extract them at will. This would also be much more accurate than having a program 'guess' as to what constitutes a keyword/keyword phrase.

As an alternative to brackets, is there a way to have text in memo fields bolded (or italicized, underlined, etc)? If so, and those attributes can survive intact after a later export operation into a Word file then it would be possible to do a Word Find/Replace operation to remove all non-bold text and leaving me with a list of keyword phrases as the final result. That would be ideal. But I don't know how to enable rich-text formatting inside an Access memo field, nor do I know for sure if such formatting will survice export outside of Access (although I presume it should).
 
Oh by the way, I am running Access 2002 SP3. I tried upgrading recently to Access 2007 but it caused me problems, so I reverted back to Access 2002. Maybe there is a plug-in that would allow me to mark up text with formatting in Access 2002?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top