Can HTML code be stripped from an ACCESS Memo field?

ORIONM42 · Jun 8, 2006

Hi everyone

I have an ACCESS file containing text extracted from a web page that still includes html markup tags, for example: <BR>, <font etc>, and all the other normal html markup codes. This information is stored inside an ACCESS MEMO field. The rest of my access database contains regular numeric and date data. Does anyone please know a way to automatically strip out all of the html markup tags from the single column that I have with these held in in, whilst leaving the rest of the worksheet unaffected?

This can not be done by a simple search and replace operation based on searching and removing <*> fields. The problem is that the text contained in the MEMO field can contain up to 1500 - 2000 characters distributed into several hundred words. The overall ACCESS .mdb file is about 10 MB size, and contains about 3000 rows with up to 17 columns. Trying the replace <*> operation just on the column that I am trying to strip the html markup from in EXCEL gives an ACCESS error: 'There isn't enough free memory to continue the search' – yet I am running XP on a pentium laptop with 1 GB of physical RAM, and about 600 MBytes free of other running processes (and hence available to ACCESS in principal), and 2.2 GBytes of free page file space. It would seem surprising if this amopunt of free memory and page file swap space was not able to deal with a 10 MByte file ???

So, if anyone would kindly be able to help with the generic problem of how to strip all the html formatting code within an access MEMO field, given the size of the data I have and the error message I am getting, I would be very grateful. I could in principle do this with a vba .bas file that does a selection.Match.something operation and then a selection.Find.Replacement.Text operation, but it is not a very elegant solution to have to do this for every single html combination, and in any case would probably not work with markup tags that contained additional qualifiers like <font size=3> etc

Many thanks

Glenn White

CautionMP · Jun 9, 2006

ORIONM42,
Asssuming "<" denotes the start of a tag and ">" denotes the start of actual text, would you consider a function similar to below? You could run it from an Update query to do all the records in one shot.

Code:

Function StripHTMLTags(HTMLText As String) As String
Dim blnText As Boolean
Dim lngChar As Long
Dim strOutput As String
For lngChar = 1 To Len(HTMLText)
  If Mid(HTMLText, lngChar, 1) = "<" Then
    blnText = False
  End If
  If blnText Then
    StripHTMLTags = StripHTMLTags & Mid(HTMLText, lngChar, 1)
  End If
  If Mid(HTMLText, lngChar, 1) = ">" Then
    blnText = True
  End If
Next lngChar
End Function

Hope this helps,
CMP

(GMT-07:00) Mountain Time (US & Canada)

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Can HTML code be stripped from an ACCESS Memo field?

ORIONM42

Technical User

CautionMP

Programmer

Similar threads

Part and Inventory Search

Sponsor