Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

TRIM Leading spaces on import data field 4

Status
Not open for further replies.

SiouxCityElvis

Programmer
Jun 6, 2003
228
US
I am Unstringing a comma delimited file and writing it to an Indexed file.

On one of the fields, we are expecting only three characters.

field1,2,3,abc,field5,etc...

The problem I'm having is that the people responsable for creating the commadelimited files are screwing up the "abc" field sometimes.

field1,2,3, abc ,field5,etc...

You'll notice that they have a leading space before "abc" and afterwards. Since I am moving this into a PIC X(3) fields before writing to my Indexed file, I am getting inaccurate values stored in my PIC X(3) field.

Is there a way to "trim" the leading spaces? Otherwise we have to "scrub" our raw data before reading it in and writing it to our indexed file.

I was wondering if the JUSTIFY clause would help? If so, how?

Thanks.
-David
 
Yes, that INSPECT would do the trick as far as getting a count of fields separated by a comma and seeing if it matches the count of fields that I expect. If the INSPECT yields a count that is 1 greater than I expect, for example, that would mean that one of the fields values is supposed to include a comma probably, such as a company name with a comma in it. Petfoods, Inc. for example.

When that count is off, it's a matter of determining which field has the comma in it, I guess.

Good suggestion.
Thanks.
 
I always code my own unstringer when I have to extract delimited fields.

The process is like this (using reference modification):

1. Find the length of the string.

2. Initialize the field counter.

3. Initialize the string pointer

4. If the current character is a quote,
increment the string pointer and look for a quote;
otherwise,
look for a comma or the end of the string.
In either case, set a second pointer to the character
found.

5. Increment the first pointer until a non-blank
character is found.

If the first pointer becomes equal to the second
pointer, the field is blank, skip to step 9.

6. Determine the length of the field by subtracting
the first pointer from the second pointer.

7. Move the input field to the target field, using
reference modification(pointer-1:pointer-2)

8. Add the second pointer to the first pointer.

9. If there are no more fields to extract, quit.

10. If the current character is a quote, increment the
first pointer.

11. Increment the first pointer.

12. Increment the field counter and return to step 4.
 
It does not appear that webrabbit's algorithm provides a mechanism for embedding the quoting character within the quoted string.

For example, standard COBOL syntax for embedding the quoting character is to use two contiguous quotes:
Code:
   MOVE "A string containing the quote ("") character"
     TO A-FIELD.

The P*rl language uses the backslash character:
Code:
   $aField = "A string containing the quote (\") character";

Tom Morrison
 
Tis true, I never thought of that. The algorithm could easily look for a quote in the next byte or a backslash in the previous byte.

BTW the COBOL syntax mentioned is for source code only!
 
The examples were given as...examples. [smile]

The point I wanted to make is that real-world solutions get a bit more complicated than a simple comma delimited example.

Tom Morrison
 
A Different "TRIM LEADING SPACES" approach, just for the record. Slower but uses no Working-Storage, needs no boud-checking.

Code:
       01  Field-A     Pic X(..) Value "   ABCD".
:
:
           Perform Until Field-A(1:1) Not = Spaces
             Move  Field-A(2:)         To Field-A
           End-Perform
Theophilos.
 
theotyflos' "solution" will work on most computers, but it may lead to destruction of the data depending on the internal processing of data movement.
 
To clarify the last post, that solution uses "overlapping" moves. This is "explicitly" undefined (when not "illegal") in the ANSI/ISO COBOL Standard.

Many (most?) compilers work as "expected" for right to left moves, but definitely NOT for left to right moves.

Bill Klein
 
Hi Theophilos,

I used this method a lot and it works ok on IBM mainframes and pc's with CA-REALIA and MF. I don't know about other compilers. When you move the other way around, the first character in the string will be repeated in the whole string.

Dit you ever experiment with moving a COMP-5 field on the PC to a BINARY or COMP which is a redefines of the COMP-5 field? It reverses all the characters in that string.

Regards,

Crox


 
Theophilos' solution is useful on RM/COBOL. In the case of overlapping MOVEs, the compiler chooses a runtime routine that determines which type of move (left-to-right or right-to-left) produces the expected result. Overlapping MOVEs, while quite useful, should always be thoroughly tested however.

Tom Morrison
 
Yes, it works only for right-to-left moves. I use it very often and never encounter a problem. I never even tried it for left-to-right.

Bill, do you know of a compiler that doesn't work correctly with this?

Crox, no, I didn't experiment with a move like this. RM/Cobol's compiler didn't support comp-5 until v8 which I don't have. I think it will reverse the field only on LSB first (intel) machines.

Tom, you mean that in RM/Cobol an overlapping left-to-right move will work as expected?
 
In Holland there was a company that teached us how to use this overlapping moves to initialize - in the time that this statement was not there - a table with for example COMP-3 or PACKED-DECIMAL fields. It was called the 'nulsteltruc' or in English the 'make zero trick'. It was told because on the IBM-mainframe it was very efficient.

01 THE-TABLE.
03 FIRST-ELEMENT.
05 FE-COUNTER-1 PIC S9(5) COMP-3 VALUE ZERO.
05 FE-COUNTER-2 PIC S9(5) COMP-3 VALUE ZERO.
05 FE-TEXT PIC X(10) VALUE SPACE.
03 THE-TABLE-REST.
05 REST-ELEMENT OCCURS 99.
07 RE-COUNTER-1 PIC S9(5) COMP-3.
07 RE-COUNTER-2 PIC S9(5) COMP-3.
07 RE-TEXT PIC X(10).
01 THE-TABLE-RED REDEFINES THE-TABLE.
03 THE-TABLE-ELEMENT OCCURS 100.
05 TT-COUNTER-1 PIC S9(5) COMP-3.
05 TT-COUNTER-2 PIC S9(5) COMP-3.
05 TT-TEXT PIC X(10).

***** INITIALIZE THE-TABLE WITH ONLY ONE MOVE *******
MOVE THE-TABLE TO THE-TABLE-REST.

I have noticed that on the mainframe we used 20% of the CPU of some on-line transactions was used with INITIALIZE statements. Using this method, this percentage is going to zero.

Regards,

Crox
 
Theophilos,

I'll need to check that. While I wrote the original code which indeed detected either type of overlap, it has been reengineered (e.g. from assembly code to C) since then and I haven't looked at that particular code in more than a decade.

Tom

Tom Morrison
 
Hi Tom,

I checked it, it doesn't work. Results are quite funny:

Code:
:
       1   Str1    Pic x(15) Value "abcdef         ".
:
           Perform Until Str1(15:1) Not = Spaces
             Move  Str1(1:14)        To Str2(2:14)
           End-Perform
:
gives:
Code:
       1   Str1    Pic x(15) Value "aaaaaaaabbcdfff".

I assume the compiler tries to optimize the move using movsd/movsw/movsb for every 4/2/1 bytes (thus the 3 f's at the end?).

Theophilos.
 
Hi,

Why the until?

What is the result after one

MOVE Str1 TO Str1 (2:).

Regards,

Crox
 
Hi Crox,

The until is because more than one "space" bytes might be in the begining (ending in case of left-to-right) of the string.

after one move I get (in RM/Cobol)

right to left:
original string:
Code:
"      abcdef   "
.
result string:
Code:
"     abcdef    "
.

left to right :
original string:
Code:
"      abcdef   "
.
result string:
Code:
"       abbdefff"
.

Theophilos.
 
If the entire field is blank, the perform will be an endless loop. It should be tested first:
[tt]
If Field-A not = Space
Perform Until Field-A(1:1) not = Space
Move Field-A(2:) to Field-A
End-Perform
End-If
[/tt]
 
Hi webrabbit,

Correct! I just thought it was a pleonasm to mention.

Theophilos.
 
Hi,

On all the other compilers I worked with, you get the first byte repeated until the end of the string. Dangerous stuff that RM compiler.

Regards,

Crox
 
It is a matter of both the compiler and the hardware. Most computers handle the left-to-right overlap properly, some may not. Some comilers compensate for the hardware constraints, and again, others do not.

IBM-360 mainframes did the move one byte at a time, so the left-to-right overlapping move was always handled properly. The right-to-left overlapping move was destructive, but could be used to initialize tables as shown by Crox. But with the IBM-370, they introduced the MOVE LONG (MVCL) instruction for moves in excess of 256 bytes. The early COBOL compilers implemented this instruction for all moves in excess of 256 bytes, but it fails on a destructive overlapping move. (It issues an Overflow condition.) So later compilers were modified to detect the destructive overlapping move and implement it with successive Move (MVC)instructions. Also, the 370 optimized moves to eight bytes at a time, but the microcode de-optimized the moves to prevent the "weird" results shown by theotyflos.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top