Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding the length of a text string 2

Status
Not open for further replies.

wilgrant

MIS
May 18, 2006
8
US
I have a need to find the length of a text string - i.e. how long is the string before the start of the trailing spaces.

The code I've written does the job, but unfortunately the program, when executed, spends 40% of it's time with this section of code, and I want to make it perform more efficiently, if possible.

The program needs to evaluate several text fields of varying lengths. In order to do so, a working storage area called Text-Field PIC X(32760) is defined, with each text field to be evaluated moved there prior to the length check. Also known is the length of each field to be tested.

The code I'm using is below:

Compute WS-Pos = WS-Field-Length
Perform Varying WS-Pos from WS-Pos by -1
Until WS-Pos < 0
Or Text-Field(WS-Pos:1) Not = Space
End-Perform

The result I need will then be found in WS-Pos, which will be equal to the length of the data in the field, minus any trailing spaces.

I will note that before entering this code, the entire field is checked to see if it is equal to Spaces, so this code will not be executed if the field is all Spaces.

Another attempt at this used the Function Reverse to move the data into a separate string and then tally leading spaces to arrive at the number of spaces to subtract from the possible field length, but this proved less efficient that the code above.

Any suggestions would be appreciated!
 
Have a look here: thread209-1133572

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Try using the FUNCTION REVERSE. Not all compilers support it.

MOVE 0 TO FIELD-LENGTH.
INSPECT FUNCTION REVERSE
TALLYING FIELD-LENGTH FOR LEADING " ".

JIH
 
as we are talking about a big string I am also going to post a modified version of the sample I mentioned on thread209-281365 (mentioned on the thread PHV mentioned above)

Code:
       IDENTIFICATION DIVISION.
       PROGRAM-ID. "STRSIZE".
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 W01-MIN       pic 9(9) BINARY.
       01 W01-MAX       pic 9(9) BINARY.
       01 W01-CHECKSIZE pic 9(9) BINARY.
       01 W01-MIDDLE    pic 9(9) BINARY.
       01 W01-STRING    pic X(32000).
       01 W01-SIZE      pic 9(9) BINARY.
       PROCEDURE DIVISION.
       MAIN.
           MOVE ZERO TO W01-SIZE.
           MOVE 1 TO W01-MIN.
           MOVE 32000 TO W01-MAX.
           IF W01-STRING NOT = SPACES
              PERFORM GET-SIZE
                UNTIL W01-MIN > W01-MAX
           END-IF.
           COMPUTE W01-SIZE = W01-MIN - 1.
           GOBACK.
       GET-SIZE.
           COMPUTE W01-MIDDLE = (W01-MIN + W01-MAX) / 2
           COMPUTE W01-CHECKSIZE = W01-MAX - W01-MIDDLE + 1
           IF W01-STRING(W01-MIDDLE:W01-CHECKSIZE) = SPACES
             COMPUTE W01-MAX = W01-MIDDLE - 1
           ELSE
              COMPUTE W01-MIN = W01-MIDDLE + 1
           END-IF.

Run some testing and tell us what was your final method and timming differences if possible.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Do you know any statistics about the length of the string? I.e., does it tend to be long or short, etc. What are the data-types of your pointer variables?
 
wilgrant,

Many suggestions here are reasonable, but there is nothing like actually knowing something about your data to get the initial guess better.

I presume your data has embedded spaces, or you wouldn't be going through this exercise. However, does it have very many instances of embedded double- or triple-spaces (i.e. two or three spaces together)? If not, the a simple [tt]INSPECT ... TALLYING BEFORE INITIAL " "[/tt] might be quite efficient at discovering the string length in one INSPECT and one IF. With some ingenuity and reference modification, you can place the INSPECT and IF in a PERFORM. You can find example code in my posting near the bottom of thread209-281365.

The point of this is that it may be more efficient to use the internal looping mechanism implicit in the INSPECT rather than build an explicit, character-at-a-time loop in the COBOL source.

Tom Morrison
 
Thanks for all of the suggestions to this point.

Regarding the data attributes, there can be pretty much anything in the data. There could be double spaces, there could be 500 bytes of data, followed by 100 spaces, followed by more data - it's pretty much wide open. The data we are analyzing is not controlable in this program, we only use it, as opposed to generating it.

Regarding the binary search routine, I've desk-checked it based one small example, and it didn't seem to me that it would work. However, I may still code it up and see what happens, since computers seem to be so much smarter than I am. One thing I will change when I code it is to change this IF:

IF W01-STRING(W01-MIDDLE:W01-CHECKSIZE) = SPACES

to this:

IF W01-STRING(W01-MIDDLE:) = SPACES

I found in desk-checking the binary routine that there is the possibility that you could end up with a very small value in W01-CHECKSIZE, such as 2, and if you had 2 imbedded spaces in the data the code would falsely indicate the wrong length.
 
wilgrant,

One more question, if you don't mind.

Do you wish to preserve the (multiple) spaces in the data, using it exactly as is, or would you like to remove redundant spaces?

Tom Morrison
 
Tom,

The data will be preserved as it was presented to the program - no removal of redundant spaces.

Thanks.
 
Okay, this code goes through the long string one 'cluster' of characters at a time. Try this and see if it does better or worse on your actual data than does the binary search. At the risk of offending my friend Francisco, I would assert that it easier to understand than the binary search, but it may not execute faster.
Code:
       identification division.
       program-id.  string-length.
       data division.
       working-storage section.
       01  text-area                  pic x(32000)
           value "a b  c   d     e                 f".
      *                    1111111111222222222233333
      *           1234567890123456789012345678901234
       01  last-nonblank-char         pic 9(8) binary.
       01  current-char               pic 9(8) binary.

       procedure division.
       a.       
           move 0 to last-nonblank-char.
           move 1 to current-char.
           inspect text-area tallying current-char for leading spaces.
           perform until current-char > 
                        length of text-area
               inspect text-area(current-char:) 
                       tallying current-char for characters 
                                before initial space
               subtract 1 from current-char giving last-nonblank-char
               inspect text-area(current-char:) 
                      tallying current-char for leading spaces
           end-perform.
           display current-char, last-nonblank-char.

Tom Morrison
 
wilgrant,

Your assumption that the use of "W01-CHECKSIZE" will give you wrong results is incorrect.
I know that at first sight it may be confusing, but this variable will always contain the correct number of characters till the end of the string you are searching.
Main benefit of it is really with big variables, not small ones (e.g. a PIC X(5000) will result in slower performance if you use the W01-CHECKSIZE than if you remove it).

You have to take in consideration that this is a binary search, and as such there is no need to look at bytes that we have already looked at, although this is not always possible to ignore.

The following output may help you see what I mean.
This was created by calling the program with a "X" in position 21000 of the string, and all the others set to spaces.
Code:
 MIN=1         MID=16000     MAX=32000     CKS=16001     LASTPOS=32000
 MIN=16001     MID=24000     MAX=32000     CKS=8001      LASTPOS=32000
 MIN=16001     MID=20000     MAX=23999     CKS=4000      LASTPOS=23999
 MIN=20001     MID=22000     MAX=23999     CKS=2000      LASTPOS=23999
 MIN=20001     MID=21000     MAX=21999     CKS=1000      LASTPOS=21999
 MIN=21001     MID=21500     MAX=21999     CKS=500       LASTPOS=21999
 MIN=21001     MID=21250     MAX=21499     CKS=250       LASTPOS=21499
 MIN=21001     MID=21125     MAX=21249     CKS=125       LASTPOS=21249
 MIN=21001     MID=21062     MAX=21124     CKS=63        LASTPOS=21124
 MIN=21001     MID=21031     MAX=21061     CKS=31        LASTPOS=21061
 MIN=21001     MID=21015     MAX=21030     CKS=16        LASTPOS=21030
 MIN=21001     MID=21007     MAX=21014     CKS=8         LASTPOS=21014
 MIN=21001     MID=21003     MAX=21006     CKS=4         LASTPOS=21006
 MIN=21001     MID=21001     MAX=21002     CKS=2         LASTPOS=21002
SIZE 21000

The following was with a X on position 3000, and all others set to spaces.
Code:
 MIN=1         MID=16000     MAX=32000     CKS=16001     LASTPOS=32000
 MIN=1         MID=8000      MAX=15999     CKS=8000      LASTPOS=15999
 MIN=1         MID=4000      MAX=7999      CKS=4000      LASTPOS=7999
 MIN=1         MID=2000      MAX=3999      CKS=2000      LASTPOS=3999
 MIN=2001      MID=3000      MAX=3999      CKS=1000      LASTPOS=3999
 MIN=3001      MID=3500      MAX=3999      CKS=500       LASTPOS=3999
 MIN=3001      MID=3250      MAX=3499      CKS=250       LASTPOS=3499
 MIN=3001      MID=3125      MAX=3249      CKS=125       LASTPOS=3249
 MIN=3001      MID=3062      MAX=3124      CKS=63        LASTPOS=3124
 MIN=3001      MID=3031      MAX=3061      CKS=31        LASTPOS=3061
 MIN=3001      MID=3015      MAX=3030      CKS=16        LASTPOS=3030
 MIN=3001      MID=3007      MAX=3014      CKS=8         LASTPOS=3014
 MIN=3001      MID=3003      MAX=3006      CKS=4         LASTPOS=3006
 MIN=3001      MID=3001      MAX=3002      CKS=2         LASTPOS=3002
SIZE 3000

By looking at the above one can see that the number of chars we are looking for spaces will reduce on each interaction, and this increases performance as its less "compares" the runtime needs to do.
Now the overhead of having another compute, and another variable may be bigger than not having it at all, but this will depend mainly on the size of the variable used. The bigger it is the better will performance be using this "checksize" variable. You have to test it and see if it is better to use it or to leave it out on your particular variable size and compiler.

Also note that I am using BINARY variables. It may be possible that by using other variable types performance will increase. This will obviously be dependent on your compiler and on your compiling options used.

Sample code to test my previous program for "wrong values" (you need to change it so it uses linkage variables.)
Code:
       IDENTIFICATION DIVISION.
       PROGRAM-ID.  testing.
       ENVIRONMENT DIVISION.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  W01-STRING PIC X(32000).
       01  W01-I PIC 9(9) BINARY.
       01  W01-SIZE PIC 9(9) BINARY.
       01  W01-MAX PIC 9(9) BINARY VALUE 32000.
       PROCEDURE       DIVISION .
       MAIN.
           PERFORM VARYING W01-I FROM W01-MAX
                BY -1 UNTIL W01-I = 0
                MOVE SPACES TO W01-STRING
                MOVE "X" TO W01-STRING(W01-I:1)
                CALL "BB" USING W01-STRING W01-SIZE
                IF W01-SIZE NOT = W01-I
                   DISPLAY "ERROR" W01-I CONVERT "=="
                           W01-SIZE CONVERT
                           "=="
                END-IF
           END-PERFORM.
           ACCEPT W01-I.
           GOBACK.


And as with all things relating to programming you need to try several options until you find the best one for your circumstances.


(and Tom, I don't get offended that easily.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top