Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using UNSTRING to parse text 3

Status
Not open for further replies.

razalas

Programmer
Apr 23, 2002
237
US
Last week I had to parse some text and extract values from a string which I did using the UNSTRING verb, but I was less than satisfied with my solution.[neutral] My dissatisfaction was due to the fact that UNSTRING always looks for all the occurrences of the delimiter(s) in the input. I would like to suggest a simple additional feature for the UNSTRING command that I feel would make it more powerful, but wanted to get some feedback before submitting to channels.

Ok, here is the deal. BTW, my environment is RM/Cobol 7.5 on SCO Unix 5.0.7, but this should really be platform independent.

I receive a text string with the following format:

uid=234(myname) gid=50(mis) groups=50,60(mis,prod)

I had to parse this and return the individual elements in the string in separate COBOL fields. For my purposes, I was able to treat all the elements as simple character strings. In this case, it was pretty straight-forward since I knew in advance the sub-fields that would occur and their order. So my code was as follows:
Code:
UNSTRING text-string DELIMITED BY ALL SPACES
    INTO uid-string
         gid-string
         group-string
UNSTRING uid-string DELIMITED BY "=" OR "(" OR ")"
    INTO uid-literal
         uid-value
         uid-name
UNSTRING gid-string DELIMITED BY "=" OR "(" OR ")"
    INTO gid-literal
         gid-value
         gid-name
UNSTRING group-string DELIMITED BY "=" OR "(" OR ")"
    INTO group-literal
         group-list
         group-names
My dissatisfaction lies in that if I did not know in advance the specific order of the sub-fields, I would want to be able to parse the string from left to right stopping at the first delimiter and determining my next action based on the left most token and processing the remainder of the text based on the current input. In other words, I would want to do something like,
Code:
77 i                        PIC S9(4) COMP.
01 text-string              PIC X(100).
01 remainder-string         PIC X(100).
01 throwaway-string         PIC X(100).
01 keyword-string           PIC X(08).
01 keyword-table.
   03 keywords      OCCURS 5 TIMES
                    ASCENDING KEY keyword-entry
                    INDEXED BY i.
      05 keyword-entry      PIC X(08).
      05 keyword-value      PIC X(08).
      05 keyword-value-desc PIC X(16).
...
INITIALIZE keyword-table
MOVE "gid"       TO keyword-entry (1)
MOVE "groups"    TO keyword-entry (2)
MOVE "iq"        TO keywork-entry (3)
MOVE "shoesize"  TO keyword-entry (4)
MOVE "uid"       TO keyword-entry (5)
PERFORM WITH TEST AFTER UNTIL text-string = SPACES
    UNSTRING text-string DELIMITED BY "="
        INTO keyword-string
             remainder-string
    SEARCH ALL keywords
        AT END
            UNSTRING remainder-string
                DELIMITED BY ALL SPACES
                INTO throwaway-string
                     text-string
        WHEN keyword-entry (i) = keyword-string
            UNSTRING remainder-string
                DELIMITED BY "("
                INTO keyword-value (i)
                     text-string
            UNSTRING text-string DELIMITED BY ")"
                INTO keyword-value-desc (i)
                     remainder-string
            MOVE remainder-string TO text-string
    END-SEARCH
END-PERFORM

But, none of the UNSTRING statements in the PERFORM loop above would achieve their desired effect, because the string they are parsing may contain additional occurrences of the delimiter being sought! In other words, the remainder-string would be truncated at the second occurrence of the delimiter. Which means I have to know ahead of time how many times that delimiter will occur in the input and then allow for n+1 INTO fields in my code. Not to mention putting all the n+1 INTO fields back together (which ok, would not be hard to do with the STRING command, but seems like a lot of wasted effort).

So, I would like to suggest allowing a FIRST option in the DELIMITED BY clause of an UNSTRING statement, which would tell COBOL to stop parsing after the first occurrence of the delimiter and put all the remaining input into the second INTO field.

Does anyone else see the value of this, or have I wandered off into left field again?[3eyes] Or worse, is this already available?[surprise] Or is there a better way to do this?

TIA for the feedback!

"Code what you mean,
and mean what you code!
But by all means post your code!"

Razalas
 
use the following construct.
move 1 to string-pointer
PERFORM WITH TEST AFTER
UNTIL text-string(string-pointer:) = SPACES
UNSTRING text-string DELIMITED BY "="
INTO keyword-string
pointer string-pointer
SEARCH ALL keywords
AT END continue
WHEN keyword-entry (i) = keyword-string
UNSTRING text-string
DELIMITED BY "("
INTO keyword-value (i)
pointer string-pointer
UNSTRING text-string
DELIMITED BY ")"
INTO keyword-value-desc (i)
pointer string-pointer
END-SEARCH
END-PERFORM


Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Hi Razalas,

COBOL has all the tools to accomplish that. Creating a "VERB" to do it is what is not worth the effort: it would be too application specific.

Dimandja
 
Frederico,

thanks for your prompt reply. It sounds like you have solved my dilemma![thumbsup]

I have used the POINTER clause on the STRING verb a lot. I'm surprised it didn't occur to me to investigate its use with the UNSTRING verb.

If I understand correctly you are saying that
Code:
UNSTRING text-string
    DELIMITED BY "="
    INTO uid-string
         remainder-string
         POINTER string-pointer

is equivalent to the following (if the following were legal)
Code:
UNSTRING text-string (string-pointer:)
    DELIMITED BY "="
    INTO uid-string
         remainder-string
keeping in mind that the object of an UNSTRING statement CANNOT be reference modified and that using the POINTER clause in this way renders the need for the remainder-string obsolete!

If so, then that would indeed accomplish exactly what I needed. I will go back and try that out!

Thank you very much! [2thumbsup]


"Code what you mean,
and mean what you code!
But by all means post your code!"

Razalas
 
Dimandja,

I was not suggesting a whole new verb, just an additional keyword in the DELIMITED BY clause.

I was thinking of a construction like the following:
Code:
UNSTRING text-string
    DELIMITED BY
Code:
FIRST
Code:
 "="
    INTO keyword-string
         remainder-string

Unfortunately, I was not very clear in my post (as oft happens when one stays up too late to try and get things done). [dazed]

"Code what you mean,
and mean what you code!
But by all means post your code!"

Razalas
 
Razalas,

You were perfectly clear to me, but what you were asking is already there.

Any unstring delimited by will pick up the very first occurrence of the delimiter on the string, so there was no need for it.

And relating to the pointer bit, then yes the remainder is gone, thus the reason why I removed it from my post.

Basically the pointer is saying to the unstring verb where to start the operation on the string. That is why we need to initialize the pointer to 1 before starting the unstring.





Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Frederico,

I have just implemented the revised version of my parse using your suggestion and it works great!

My trouble was the misconception that the UNSTRING would always process all the characters from the input field, when in fact, it will only process as many characters as are needed to satisfy the number of INTO fields specified [or until it runs out of input].

My thanks to you with a second star this morning!

"Code what you mean,
and mean what you code!
But by all means post your code!"

Razalas
 
An addendum that some perusing this thread might find useful:

If you find yourself allowing several delimiters, it might be nice to know which delimiter terminated the field. This may be obtained using the DELIMITER IN clause.

If you want to know how many characters were actually moved to the receiving field, check out the COUNT IN clause (n.b. COUNT IN requires the DELIMITED BY clause).

And don't forget ON OVERFLOW to keep things in control.

Tom Morrison
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top