TRIM Leading spaces on import data field 4

SiouxCityElvis · Sep 26, 2003

I am Unstringing a comma delimited file and writing it to an Indexed file.

On one of the fields, we are expecting only three characters.

field1,2,3,abc,field5,etc...

The problem I'm having is that the people responsable for creating the commadelimited files are screwing up the "abc" field sometimes.

field1,2,3, abc ,field5,etc...

You'll notice that they have a leading space before "abc" and afterwards. Since I am moving this into a PIC X(3) fields before writing to my Indexed file, I am getting inaccurate values stored in my PIC X(3) field.

Is there a way to "trim" the leading spaces? Otherwise we have to "scrub" our raw data before reading it in and writing it to our indexed file.

I was wondering if the JUSTIFY clause would help? If so, how?

Thanks.
-David

SiouxCityElvis · Oct 1, 2003

Yes, that INSPECT would do the trick as far as getting a count of fields separated by a comma and seeing if it matches the count of fields that I expect. If the INSPECT yields a count that is 1 greater than I expect, for example, that would mean that one of the fields values is supposed to include a comma probably, such as a company name with a comma in it. Petfoods, Inc. for example.

When that count is off, it's a matter of determining which field has the comma in it, I guess.

Good suggestion.
Thanks.

webrabbit · Oct 1, 2003

I always code my own unstringer when I have to extract delimited fields.

The process is like this (using reference modification):

1. Find the length of the string.

2. Initialize the field counter.

3. Initialize the string pointer

4. If the current character is a quote,
increment the string pointer and look for a quote;
otherwise,
look for a comma or the end of the string.
In either case, set a second pointer to the character
found.

5. Increment the first pointer until a non-blank
character is found.

If the first pointer becomes equal to the second
pointer, the field is blank, skip to step 9.

6. Determine the length of the field by subtracting
the first pointer from the second pointer.

7. Move the input field to the target field, using
reference modification(pointer-1

ointer-2)

8. Add the second pointer to the first pointer.

9. If there are no more fields to extract, quit.

10. If the current character is a quote, increment the
first pointer.

11. Increment the first pointer.

12. Increment the field counter and return to step 4.

k5tm · Oct 2, 2003

It does not appear that webrabbit's algorithm provides a mechanism for embedding the quoting character within the quoted string.

For example, standard COBOL syntax for embedding the quoting character is to use two contiguous quotes:

Code:

   MOVE &quot;A string containing the quote (&quot;&quot;) character&quot;
     TO A-FIELD.

The P*rl language uses the backslash character:

Code:

   $aField = &quot;A string containing the quote (\&quot;) character&quot;;

Tom Morrison

http://www.liant.com

webrabbit · Oct 2, 2003

Tis true, I never thought of that. The algorithm could easily look for a quote in the next byte or a backslash in the previous byte.

BTW the COBOL syntax mentioned is for source code only!

k5tm · Oct 2, 2003

The examples were given as...examples. [smile]

The point I wanted to make is that real-world solutions get a bit more complicated than a simple comma delimited example.

Tom Morrison

http://www.liant.com

theotyflos · Nov 21, 2003

A Different "TRIM LEADING SPACES" approach, just for the record. Slower but uses no Working-Storage, needs no boud-checking.

Code:

       01  Field-A     Pic X(..) Value &quot;   ABCD&quot;.
:
:
           Perform Until Field-A(1:1) Not = Spaces
             Move  Field-A(2:)         To Field-A
           End-Perform

Theophilos.

webrabbit · Nov 21, 2003

theotyflos' "solution" will work on most computers, but it may lead to destruction of the data depending on the internal processing of data movement.

WMK · Nov 21, 2003

To clarify the last post, that solution uses "overlapping" moves. This is "explicitly" undefined (when not "illegal&quot

in the ANSI/ISO COBOL Standard.

Many (most?) compilers work as "expected" for right to left moves, but definitely NOT for left to right moves.

Bill Klein

Crox · Nov 21, 2003

Hi Theophilos,

I used this method a lot and it works ok on IBM mainframes and pc's with CA-REALIA and MF. I don't know about other compilers. When you move the other way around, the first character in the string will be repeated in the whole string.

Dit you ever experiment with moving a COMP-5 field on the PC to a BINARY or COMP which is a redefines of the COMP-5 field? It reverses all the characters in that string.

Regards,

Crox

k5tm · Nov 21, 2003

Theophilos' solution is useful on RM/COBOL. In the case of overlapping MOVEs, the compiler chooses a runtime routine that determines which type of move (left-to-right or right-to-left) produces the expected result. Overlapping MOVEs, while quite useful, should always be thoroughly tested however.

Tom Morrison

http://www.liant.com

theotyflos · Nov 23, 2003

Yes, it works only for right-to-left moves. I use it very often and never encounter a problem. I never even tried it for left-to-right.

Bill, do you know of a compiler that doesn't work correctly with this?

Crox, no, I didn't experiment with a move like this. RM/Cobol's compiler didn't support comp-5 until v8 which I don't have. I think it will reverse the field only on LSB first (intel) machines.

Tom, you mean that in RM/Cobol an overlapping left-to-right move will work as expected?

Crox · Nov 23, 2003

In Holland there was a company that teached us how to use this overlapping moves to initialize - in the time that this statement was not there - a table with for example COMP-3 or PACKED-DECIMAL fields. It was called the 'nulsteltruc' or in English the 'make zero trick'. It was told because on the IBM-mainframe it was very efficient.

01 THE-TABLE.
03 FIRST-ELEMENT.
05 FE-COUNTER-1 PIC S9(5) COMP-3 VALUE ZERO.
05 FE-COUNTER-2 PIC S9(5) COMP-3 VALUE ZERO.
05 FE-TEXT PIC X(10) VALUE SPACE.
03 THE-TABLE-REST.
05 REST-ELEMENT OCCURS 99.
07 RE-COUNTER-1 PIC S9(5) COMP-3.
07 RE-COUNTER-2 PIC S9(5) COMP-3.
07 RE-TEXT PIC X(10).
01 THE-TABLE-RED REDEFINES THE-TABLE.
03 THE-TABLE-ELEMENT OCCURS 100.
05 TT-COUNTER-1 PIC S9(5) COMP-3.
05 TT-COUNTER-2 PIC S9(5) COMP-3.
05 TT-TEXT PIC X(10).

***** INITIALIZE THE-TABLE WITH ONLY ONE MOVE *******
MOVE THE-TABLE TO THE-TABLE-REST.

I have noticed that on the mainframe we used 20% of the CPU of some on-line transactions was used with INITIALIZE statements. Using this method, this percentage is going to zero.

Regards,

Crox

k5tm · Nov 24, 2003

Theophilos,

I'll need to check that. While I wrote the original code which indeed detected either type of overlap, it has been reengineered (e.g. from assembly code to C) since then and I haven't looked at that particular code in more than a decade.

Tom

Tom Morrison

http://www.liant.com

theotyflos · Nov 24, 2003

Hi Tom,

I checked it, it doesn't work. Results are quite funny:

Code:

:
       1   Str1    Pic x(15) Value &quot;abcdef         &quot;.
:
           Perform Until Str1(15:1) Not = Spaces
             Move  Str1(1:14)        To Str2(2:14)
           End-Perform
:

gives:

Code:

       1   Str1    Pic x(15) Value &quot;aaaaaaaabbcdfff&quot;.

I assume the compiler tries to optimize the move using movsd/movsw/movsb for every 4/2/1 bytes (thus the 3 f's at the end?).

Theophilos.

Crox · Nov 25, 2003

Hi,

Why the until?

What is the result after one

MOVE Str1 TO Str1 (2

.

Regards,

Crox

theotyflos · Nov 25, 2003

Hi Crox,

The until is because more than one "space" bytes might be in the begining (ending in case of left-to-right) of the string.

after one move I get (in RM/Cobol)

right to left:
original string:

Code:

&quot;      abcdef   &quot;

.
result string:

Code:

&quot;     abcdef    &quot;

.

left to right :
original string:

Code:

&quot;      abcdef   &quot;

.
result string:

Code:

&quot;       abbdefff&quot;

.

Theophilos.

webrabbit · Nov 25, 2003

If the entire field is blank, the perform will be an endless loop. It should be tested first:
[tt]
If Field-A not = Space
Perform Until Field-A(1:1) not = Space
Move Field-A(2

to Field-A
End-Perform
End-If
[/tt]

theotyflos · Nov 25, 2003

Hi webrabbit,

Correct! I just thought it was a pleonasm to mention.

Theophilos.

Crox · Nov 25, 2003

Hi,

On all the other compilers I worked with, you get the first byte repeated until the end of the string. Dangerous stuff that RM compiler.

Regards,

Crox

webrabbit · Nov 25, 2003

It is a matter of both the compiler and the hardware. Most computers handle the left-to-right overlap properly, some may not. Some comilers compensate for the hardware constraints, and again, others do not.

IBM-360 mainframes did the move one byte at a time, so the left-to-right overlapping move was always handled properly. The right-to-left overlapping move was destructive, but could be used to initialize tables as shown by Crox. But with the IBM-370, they introduced the MOVE LONG (MVCL) instruction for moves in excess of 256 bytes. The early COBOL compilers implemented this instruction for all moves in excess of 256 bytes, but it fails on a destructive overlapping move. (It issues an Overflow condition.) So later compilers were modified to detect the destructive overlapping move and implement it with successive Move (MVC)instructions. Also, the 370 optimized moves to eight bytes at a time, but the microcode de-optimized the moves to prevent the "weird" results shown by theotyflos.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

TRIM Leading spaces on import data field 4

Programmer

Programmer

MIS

Programmer

MIS

Programmer

Programmer

MIS

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

MIS

Programmer

Programmer

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor