Character or Word Count on an RTF Field

USInc · Jul 8, 2010

A table in our database used to be stored as plain text, however it just got converted over to RTF. Before this I was using len({MyField}) to get a character count. Now this formula doesn't work thanks to the RTF formatting.

So I started using UBound(Split({MyField})) which was great until I realized that some of the data contained more than 1000 words.

So, can anyone help me come up with a way of counting either characters or words (or both)?

Thanks very much!

lbass · Jul 8, 2010

Assuming there are spaces between words, a word count could be:

local stringvar x := {table.field};
local numbervar i;
local numbervar cnt;
for i := 1 to len(x) do(
if x = " " then
cnt := cnt + 1
);
cnt + 1

For a character count, you could use something like:

local stringvar x := {table.field};
local numbervar i;
local numbervar cnt;
for i := 1 to len(x) do (
if not(x in ["<",">"]) then //add other possible rtf code characters
cnt := cnt + 1
);
cnt

-LB

IanWaterman · Jul 9, 2010

But what are you counting?

RTF docs contain a load of header and footer details about font, size etc.

If you do not want to count the header characters you will need to split text before you start counting.

Also within the doc there are other formatting characters which you may not want to count either, things like indent and new para commands.

Ian

lbass · Jul 9, 2010

Yes, I think my suggestions probably wouldn't work quite right after all. I couldn't quite remember what the rtf coding looks like. It seems like you should be able to use maybe nested splits to avoid the 1000 array limits.

-LB

USInc · Jul 9, 2010

I am trying to count characters or words within a Progress Note in our electronic medical record. For example, someone might enter "Initial Assessment Completed" into their text field, however when it gets saved to the database, all of the formatting comes with it. So it would look like this:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}{\f1\fswiss\fprq2\fcharset0 System;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\fs20 Initial Assessment Completed\cf0\b\f1
\par }

I am trying to find a way to count the characters in this string so I would get 28, or count words so I would get 3. Since the formatting isn't the same (anyone using it can change font size, style, color, etc) I can't think of any way of stripping off the formatting short of building a very long stored procedure that will look for every possible RTF formatting string.

UBound(Split({MyField})) does work, but again, only for note less than 1000 words, and we have some that are well over 1000.

Thanks again.

fisheromacse · Jul 9, 2010

you can try the following to remove all of the rtf formatting information.
I had some fields that i needed to do something similar and although this did work for me, it feels clunky.

\\{@RTF1}
REPLACE({table.field},(exactstring({table.field},"{\r","\fs20")),"")

\\{@RTF2}
REPLACE({@RTF1},"{\r\fs20","")

\\{@RTF3}
REPLACE({@RTF2},"\par","")

\\{@RTF4}
REPLACE({RTF3},"}","")

then you could do a len on {@RTF4}.

IanWaterman · Jul 9, 2010

Another possibility is to use Instr to find first space and then count from there using LBs formula until you encounter a \

Or if all RTF footers start with \cf use split to leave you with remaining text.

Ian

fisheromacse · Jul 9, 2010

Ian,
I don't think that would work. the 1st space in RTF is right before the Font name. like below:

{\rtf1\ansi{\fonttbl\f0\fcharset0 Helvetica;}\f0\pard
BODY TEST TEXT.\par }

also, i have not tested, but am fairly certain that if users insert bolds or other word/character specific tags into the center of the rtf field, my implementation would have some issues.

IanWaterman · Jul 9, 2010

In the words of Private Frazier of Dad's Army (70's British Sitcom about Home Guard during war)

"I think you're dooooomed"

Ian

lbass · Jul 9, 2010

Can you clarify whether there are always consistent tags around the relevant words?

-LB

USInc · Jul 9, 2010

Ian, I think you're right... but I can always hold out hope!

LB, they are not consistent. If the user decides to use anything different, the tags will change. And they can make formatting changes within the text, so there could be tags dispersed throughout the string.

Not an easy task, but I appreciate everyone's help!

fisheromacse · Jul 9, 2010

just curious, what electronic medical record software do you use?

USInc · Jul 9, 2010

We are using Profiler - it's made by Unicare. We just upgraded this past weekend to the newest version and the RFT change was implemented then. The users are happy they can make text bold and pink, but it's giving me a headache and there is no way to disable the feature... **sigh**

lbass · Jul 9, 2010

US Inc,

Can you provide a few more examples of the field?

-LB

USInc · Jul 9, 2010

Sure thing...

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}{\f1\fswiss\fprq2\fcharset0 System;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\fs20 Non-Billable 2400 Only\cf0\b\f1
\par }

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}{\f1\fswiss\fprq2\fcharset0 System;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\f0\fs16
\par Problems : Problem 02 getting nervous and low self-confidence
\par Problems : Problem 01 trouble adjusting to all the new changes in the family
\par D: individual session with cl, who excitedly talked about her vacation with her grandmother.
\par A: cl was somewhat anxious, twisting her hair and fidgeting in her chair.
\par P: continue to work on reducing anxiety, helping cl adjust to changes
\par next apt: two weeks\cf0\b\f1\fs20
\par }

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}{\f1\froman\fcharset0 Times New Roman;}{\f2\fswiss\fprq2\fcharset0 System;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\sb100\sa100\cf1\fs20 t/c with conservator regarding expired releases and cl lack of participation in programming. Conservator informed me that cl's sister has put home on the market and is hoping to close before the end of August leaving cl without a place to live on Sept 1. conservator reports telling sister that cl has no benefits and will not be paying sister rent. I told conservator that SSA has sent a bill for cl but that he has failed all appts with me to go to SSA/DSS and has not yet returned my calls about going to SSA with rep payee. conservator is concerned about cl discharge and I assured her that we are willing to ride his wave of engagement right now but cl might force our hand in closing his case if he continues lack of engagement. Conservator requested meeting Tuesday July 20\super th\nosupersub at 10am with treatment team. \cf0\f1\fs24
\par \pard\b\f2\fs20
\par }

lbass · Jul 9, 2010

It looks like your second example contains multiple sections that you would want to extract?

Also, I'm not familiar with rtf code, but it appears that the "fs20" in a couple of your examples is followed immediately by text you want to extract. I'm surprised this isn't set off in order to distinguish it from the desired text.

-LB

USInc · Jul 9, 2010

The second example is still one string, however those \par tags are carriage returns.

I haven't noticed any pattern that I can easily discern to indicate where a string starts and/or ends.

IdoMillet · Jul 10, 2010

What is the core objective? In other words, why do you need to get the length or number of words?

Would you consider a solution using a UFL?

- Ido

view, email, export, burst, distribute, and schedule Crystal Reports.

http://www.MilletSoftware.com

USInc · Jul 10, 2010

The core objective is to try to identify improper progress notes for services provided. The software prevents you from saving a service with no progress note, but that doesn't stop a user from entering "progress note" into the note just to get around this. The users intent is to go back later and finish the note, but they often forget. So I have several reports that look for attended services that have short progress notes.

Conversely we have users who will mark an service as canceled when it really should be attended. So I have other reports that will look for long progress notes for services that aren't marked as attended.

They are a useful tool for locating user error. Those same reports already look for key words, like failed, canceled, attended as well as soundex lookup. So I really need something that simply looks for characters or words. I'm open to any suggestion.

IdoMillet · Jul 10, 2010

Since your objective can be satisfied by an approximate count of words, you could still use the approach of counting spaces.

If you need a more precise solution, perhaps a UFL is the way to go.

hth,
- Ido

view, email, export, burst, distribute, and schedule Crystal Reports.

http://www.MilletSoftware.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Character or Word Count on an RTF Field

IS-IT--Management

Technical User

Programmer

Technical User

IS-IT--Management

IS-IT--Management

Programmer

IS-IT--Management

Programmer

Technical User

IS-IT--Management

IS-IT--Management

IS-IT--Management

Technical User

IS-IT--Management

Technical User

IS-IT--Management

Instructor

IS-IT--Management

Instructor

Similar threads

Log in

Part and Inventory Search

Sponsor