Last week I had to parse some text and extract values from a string which I did using the UNSTRING verb, but I was less than satisfied with my solution. My dissatisfaction was due to the fact that UNSTRING always looks for all the occurrences of the delimiter(s) in the input. I would like to suggest a simple additional feature for the UNSTRING command that I feel would make it more powerful, but wanted to get some feedback before submitting to channels.
Ok, here is the deal. BTW, my environment is RM/Cobol 7.5 on SCO Unix 5.0.7, but this should really be platform independent.
I receive a text string with the following format:
uid=234(myname) gid=50(mis) groups=50,60(mis,prod)
I had to parse this and return the individual elements in the string in separate COBOL fields. For my purposes, I was able to treat all the elements as simple character strings. In this case, it was pretty straight-forward since I knew in advance the sub-fields that would occur and their order. So my code was as follows:
My dissatisfaction lies in that if I did not know in advance the specific order of the sub-fields, I would want to be able to parse the string from left to right stopping at the first delimiter and determining my next action based on the left most token and processing the remainder of the text based on the current input. In other words, I would want to do something like,
But, none of the UNSTRING statements in the PERFORM loop above would achieve their desired effect, because the string they are parsing may contain additional occurrences of the delimiter being sought! In other words, the remainder-string would be truncated at the second occurrence of the delimiter. Which means I have to know ahead of time how many times that delimiter will occur in the input and then allow for n+1 INTO fields in my code. Not to mention putting all the n+1 INTO fields back together (which ok, would not be hard to do with the STRING command, but seems like a lot of wasted effort).
So, I would like to suggest allowing a FIRST option in the DELIMITED BY clause of an UNSTRING statement, which would tell COBOL to stop parsing after the first occurrence of the delimiter and put all the remaining input into the second INTO field.
Does anyone else see the value of this, or have I wandered off into left field again? Or worse, is this already available? Or is there a better way to do this?
TIA for the feedback!
"Code what you mean,
and mean what you code!
But by all means post your code!"
Razalas
Ok, here is the deal. BTW, my environment is RM/Cobol 7.5 on SCO Unix 5.0.7, but this should really be platform independent.
I receive a text string with the following format:
uid=234(myname) gid=50(mis) groups=50,60(mis,prod)
I had to parse this and return the individual elements in the string in separate COBOL fields. For my purposes, I was able to treat all the elements as simple character strings. In this case, it was pretty straight-forward since I knew in advance the sub-fields that would occur and their order. So my code was as follows:
Code:
UNSTRING text-string DELIMITED BY ALL SPACES
INTO uid-string
gid-string
group-string
UNSTRING uid-string DELIMITED BY "=" OR "(" OR ")"
INTO uid-literal
uid-value
uid-name
UNSTRING gid-string DELIMITED BY "=" OR "(" OR ")"
INTO gid-literal
gid-value
gid-name
UNSTRING group-string DELIMITED BY "=" OR "(" OR ")"
INTO group-literal
group-list
group-names
Code:
77 i PIC S9(4) COMP.
01 text-string PIC X(100).
01 remainder-string PIC X(100).
01 throwaway-string PIC X(100).
01 keyword-string PIC X(08).
01 keyword-table.
03 keywords OCCURS 5 TIMES
ASCENDING KEY keyword-entry
INDEXED BY i.
05 keyword-entry PIC X(08).
05 keyword-value PIC X(08).
05 keyword-value-desc PIC X(16).
...
INITIALIZE keyword-table
MOVE "gid" TO keyword-entry (1)
MOVE "groups" TO keyword-entry (2)
MOVE "iq" TO keywork-entry (3)
MOVE "shoesize" TO keyword-entry (4)
MOVE "uid" TO keyword-entry (5)
PERFORM WITH TEST AFTER UNTIL text-string = SPACES
UNSTRING text-string DELIMITED BY "="
INTO keyword-string
remainder-string
SEARCH ALL keywords
AT END
UNSTRING remainder-string
DELIMITED BY ALL SPACES
INTO throwaway-string
text-string
WHEN keyword-entry (i) = keyword-string
UNSTRING remainder-string
DELIMITED BY "("
INTO keyword-value (i)
text-string
UNSTRING text-string DELIMITED BY ")"
INTO keyword-value-desc (i)
remainder-string
MOVE remainder-string TO text-string
END-SEARCH
END-PERFORM
But, none of the UNSTRING statements in the PERFORM loop above would achieve their desired effect, because the string they are parsing may contain additional occurrences of the delimiter being sought! In other words, the remainder-string would be truncated at the second occurrence of the delimiter. Which means I have to know ahead of time how many times that delimiter will occur in the input and then allow for n+1 INTO fields in my code. Not to mention putting all the n+1 INTO fields back together (which ok, would not be hard to do with the STRING command, but seems like a lot of wasted effort).
So, I would like to suggest allowing a FIRST option in the DELIMITED BY clause of an UNSTRING statement, which would tell COBOL to stop parsing after the first occurrence of the delimiter and put all the remaining input into the second INTO field.
Does anyone else see the value of this, or have I wandered off into left field again? Or worse, is this already available? Or is there a better way to do this?
TIA for the feedback!
"Code what you mean,
and mean what you code!
But by all means post your code!"
Razalas