Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

TRIM Leading spaces on import data field 4

Status
Not open for further replies.

SiouxCityElvis

Programmer
Jun 6, 2003
228
US
I am Unstringing a comma delimited file and writing it to an Indexed file.

On one of the fields, we are expecting only three characters.

field1,2,3,abc,field5,etc...

The problem I'm having is that the people responsable for creating the commadelimited files are screwing up the "abc" field sometimes.

field1,2,3, abc ,field5,etc...

You'll notice that they have a leading space before "abc" and afterwards. Since I am moving this into a PIC X(3) fields before writing to my Indexed file, I am getting inaccurate values stored in my PIC X(3) field.

Is there a way to "trim" the leading spaces? Otherwise we have to "scrub" our raw data before reading it in and writing it to our indexed file.

I was wondering if the JUSTIFY clause would help? If so, how?

Thanks.
-David
 
JUSTIFY won't help.

To pick up where we last left off, I will assume you are using POINTER in your UNSTRING. If it is also the case that you wish to trim leading spaces on all fields you may do something like this:
Code:
01  INPUT-FIELD  PIC X(123).
78  INPUT-FIELD-SIZE value LENGTH OF INPUT-FIELD.

01  .
    02  FIELD-HOLDER PIC X(50) OCCURS 40.

MOVE 1 to I.
MOVE 1 TO J.
PERFORM UNTIL I > INPUT-FIELD-SIZE
    INSPECT INPUT-FIELD (I:) TALLYING I FOR LEADING SPACE
    IF I NOT > INPUT-FIELD-SIZE
        UNSTRING INPUT-FIELD DELIMITED BY ","
            INTO FIELD-HOLDER (J)
            POINTER I
            ON OVERFLOW  <handle the error case>
        END-UNSTRING
        ADD 1 to J
    END-IF
END-PERFORM.
SUBTRACT 1 FROM J.
At this point J contains the number of fields, each of which is stored, left-justified, in the FIELD-HOLDER array elements.


Tom Morrison
 
FWIW, another approach:

Define a 4 byte pic x work field for abc and unstring abc to the work field instead of the rec field, as part of the full unstring.

After the unstring of all fields into the rec, test the abc work field for a leading space.

YES- move wrk-abc(2:3) to rec-abc
No - move wrk-abc to rec-abc

This will work if you're confident that the leading spaces will always be 1 or 0. If you're not sure, use inspect tallying to find the number of leading spaces.

Regards, Jack.
 
Thanks. I am not currently using POINTER, but I will look at POINTER in the manual and see what is said about it.
 
As far as the suggestions so far, how certain are you that the &quot;worst&quot; case situation will be a SINGLE space before or after the &quot;real&quot; data? If you can get bad data, is it possible that you will get 2 (or more) initial/trailing spaces? If so, the &quot;solution&quot; will be (a little) more complex.

If you are POSITIVE that you will never get more than one initial/trailing space, it should be pretty easy to handle.

Bill Klein
 
Bill,

My solution doesn't presume any number of leading spaces. (I did put actual numbers in the PIC clauses, but just for illustration.) And -- trailing spaces in COBOL should be no problem!

Tom Morrison
 
Sorry Tom,
Yes you handle any number of leading spaces. The only step &quot;missing&quot; from your solution is the (implicit) MOVE somewhere else in the code to a 3-byte field to get rid of the trailing spaces.

(And you use the &quot;arbitray&quot; 50 bytes for each receiving field of the UNSTRING. My PERSONAL preference - if space in working-storage isn't at a premium is to make each receiving field - in the table - big enough to hold the ENTIRE/MAXIMUM sending field. Again, assuming this is known at compile-time.)

Bill Klein
 
Thanks for all the suggestions. Basically, I was told to &quot;scrub&quot; the data that will be sent to us in comma delimited format. The first thing for me to work on as far as &quot;scrubbing&quot; that has become blatantly obvious is taking care of leading space(s) in a field that would look something like:

good,bad , ugly , even uglier ,good,bad ,etc..

So, I think my first step on this project is to handle the raw data and &quot;clean&quot; it up as if the clients fat fingered the data, or reformated it in Word or Excel and some how there data got skewed with leading space(s) or the like.

Any other scenarios you can think of that should be handled in &quot;data scrubbing&quot; job steps, please let me know.

I was diverted back to another project, but I will implement the code given by Tom above to see if I can work with it. Thanks Tom!
Thanks.
-David
 
Tom:

I keep getting a data ref error at runtime stating:

COBOL data reference error 104 at line 205 in program FP_IMPORTA.COB

which pertains to
UNSTRING INPUT-FIELD DELIMITED BY &quot;,&quot;

should the UNSTRING statement have &quot;INPUT-FIELD(I:)&quot; instead of just &quot;INPUT-FIELD&quot;?

Thanks.
-David
 
I also get in trouble parsing this raw data if they ever have an empty field value:

good, bad, ugly ,,nothinginlastfield,etc.

For some reason, when I DISPLAY my FIELD-HOLDER(CTR-J) on each iteration after the UNSTRING statement, I get...

&quot;good&quot;
&quot;bad&quot;
&quot;ugly&quot;

but no blank and no display of &quot;nothinginlastfield&quot;

It actually ends skipping the rest of the record I think and goes to a different record in the file.

 
David,

We don't know what kind of data is in those fields. Dates, names, addresses, comments, etc. You may want to entertain the possibility that one of those strings may contain a comma (,) and ask what would that string look like.

You may even go so far as to ask your client to supply:

1) The rules used for creating the file, or
2) Supply enough &quot;live&quot; data for you to &quot;eyeball&quot; and hopefully determine the rules yourself.

Regards, Jack.


 
Amarillo

Point 1.
The usage of Pointer means that you don´t need to use &quot;INPUT-FIELD(I:)&quot; see Thread209-654374 for an explanation.

As you are saying you have a &quot;104&quot; error and that the output is not the one you expect I suspect that you have a minor error on your program and/or data definition variables.

Following is a working example that will not give you that error and will indeed display
---------
good
bad
ugly

nothinginlastfield
------

If you still have problems would you please post your full program along with a sample test data that issues you the error so that we can have a look at it.


-------
IDENTIFICATION DIVISION.
PROGRAM-ID. DEMO.
ENVIRONMENT DIVISION.
DATA DIVISION.
WORKING-STORAGE SECTION.

01 input-field pic x(200)
value &quot;good, bad, ugly ,,nothinginlastfield&quot;.
01 input-field-size pic 999 value 200.
01 I PIC 999.
01 J PIC 999.
01 VAR1.
03 field-holder pic x(200) occurs 20.



PROCEDURE DIVISION.
BEGIN.
MOVE 1 to I.
MOVE 1 TO J.
PERFORM UNTIL I > INPUT-FIELD-SIZE
INSPECT INPUT-FIELD (I:) TALLYING I FOR LEADING SPACE
IF I NOT > INPUT-FIELD-SIZE
UNSTRING INPUT-FIELD DELIMITED BY &quot;,&quot;
INTO FIELD-HOLDER (J)
POINTER I
END-UNSTRING
ADD 1 to J
END-IF
END-PERFORM.
SUBTRACT 1 FROM J.
PERFORM VARYING J FROM 1 BY 1 UNTIL J > 20
DISPLAY FIELD-HOLDER(J)(1:60)
END-PERFORM.

accept I.
GOBACK.


Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Okay I'll try that. The INPUT-FIELD will be populated with an sequential file record that I'm reading in, so the only thing I'll need to add is a MOVE IN-REC TO INPUT-FIELD statement before doing the Perform loop.
 
Frederico,

Okay. This works. However when I put a DISPLAY statement under an ON OVERFLOW to see if I get overflow on each iteration, the display statement comes up. Tom, previously mentioned to <handle the error case> when ON OVERFLOW occurs. Being that my FIELD-HOLDER(J) fields are all being populated I'm not sure exactly why I'm getting overflow.

Any obvious reasons you can think of that OVERFLOW would occur?

Thanks for the help.
-David
Catch a star Frederico and Tom!
 
I wourld just write a quick preprocessor.

01 input-record pic x(500) value spaces.
01 fixed-record pic x(500) values spaces.
01 sub pic 999 value zeroes.
01 sub2 pic 999 value zeroes.
move the input record to input-record.
inspect input-record replaceing all ', ' by &quot;,$&quot;.
move zeroes to sub sub2.
perform loop 500 times go to cont.

loop.
if input-record(sub:1) = '$' go to loop.
add 1 to sub2.
move input-record(sub:1) to fixed-record(sub2:1)

cont.

record is fixed and you don't need to fix the trailing
space because you can just truncate when you move it to the fixed field.

Enjoy Bob
 
David,

My bad. [blush]

One of the conditions for causing an overflow on an UNSTRING is if, after all receiving data fields have been acted upon, there remain unexamined characters in the sending field.

Since the act of placing the UNSTRING in a loop naturally results in have unexamined characters on each loop iteration except the last, each of those iterations will cause an overflow condition to exist.

My advice (caution!!! [bigsmile]) is to remove the ON OVERFLOW since the other condition that causes overflow (bad value in POINTER variable) is managed by the PERFORM and IF statements that control the execution of the UNSTRING

Tom Morrison
 
Great! I'm glad to hear your latest advice is consistent with what I tried - I simply commented out the ON OVERFLOW code, and I didn't seem to have any problems.

Slade brought up a good suggestion also about communication with the client about their formatting.

I know that if they ever put a comma in let's say a Company name field, that it's going to cause problems as is. Somewhere along the way before I process their csv file, it will need quotes around each individual field so that I don't delimit based off a comma that is intended as part of a string value i.e. &quot;Company Name, Inc.&quot;

Thanks again everyone.
 
David,

Quoted strings are considerably more complicated than simple comma-separated data (though certainly not impossible).

If the client data supplier has any ability to change the formatting of the data, I would suggest the use of a more rare character (such as tilde, vertical bar, backslash) in place of the comma field separator, and go to quoted strings as a last resort.

Also, as an integrity check for fields separated by a character (comma or otherwise), you should probably use something like:
Code:
MOVE 0 to I
INSPECT INPUT-FIELD TALLYING I FOR &quot;,&quot;
and then make sure the value in I is what you expect before you start UNSTRINGing.

Tom Morrison
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top