Defining The Tab Character As A Boolean Or Hexadecimal Literal

waycon · Dec 26, 2002

I have a need to examine the contents of a tab delimited text file.

To find the start and end of each field I need to look at each character in the string to determine if it's a tab character.

My question is in regard to how to define a tab character for comparison. I'm thinking that I can define the tab character using a hexadecimal or boolean literal, but can't find the values on any code charts.

Any suggestions?

Dimandja · Dec 26, 2002

The ASCII character known as 'Horizontal Tab' is generally considered as TAB. Hex = 09, Decimal = 9.

Dimandja

Dimandja · Dec 28, 2002

To use a TAB value in your program, you need something like:

Code:

  01  DATA-RECORD.
      02  DATA-CHAR PIC X OCCURS 2048 TIMES.
 
  01 tab-char-grp.
     05 tab-char-comp   PIC 9(04) COMP VALUE 09.
     05 tab-char-x REDEFINES tab-char-comp.
        10 FILLER       PIC X(01).
        10 TAB-CHAR     PIC X(01).

     IF DATA-CHAR(DATA-SUBSCRIPT) = TAB-CHAR 
        ADD 1 TO TAB-COUNT.

Dimandja

3gm · Dec 28, 2002

I prefer the SYMBOLIC phrase of SPECIAL-NAMES for this stuff:

Code:

SPECIAL-NAMES.
    SYMBOLIC TAB-CHAR 10
             ESC-CHAR 28
    .
[\code]
Note that the numbers are ordinals and, therefore, are one greater than their binary equivalent.  Also note that the values shown are ASCII dependent.  NB if you're using EBCDIC, a tab character is 6, not 10.

Glenn

gphuber · Dec 29, 2002

I use alternative ; or TAB as an indicator to separate filds with the unstring command. I have defined one PIC X field cald Trennfeld and if i say it is delimited by TAB I move X"09" to it and and unstring the file with the delimiter Trennfeld. It works.

regards

Goetz

Dimandja · Dec 29, 2002

If you work with TABS, be aware of how you software/environment/editor interprets 'TAB'. It could make a big difference in how you code for it.

If one creates a file using 'TABs', the application may simply skip so many columns - instead of actually inserting the TAB character in the data.

Some manipulation of the data could implicitly replace TAB with SPACE - COPY/PASTE, utilities, etc...

For more details, see

http://www.jwz.org/doc/tabs-vs-spaces.html.

Dimandja

gphuber · Dec 29, 2002

Hello Dimandja,

i work in germany and use the tab by transferung data between several applications. in the target softwrae, a third party, it is defined that tab is hex 09 so i have never got problems. i thought this information could help.

Regards

Goetz

waycon · Dec 31, 2002

Thanks for all the suggestions. I've tried them all, but I am still having a problem identifying tabs.

My test file is a notepad txt file with the character string : 12345678>9 where ">" represents the horizontal tab.

A hex dump of the tax file shows: 0 31 32 33 34 35 36 37 38 09 39 eof. My program reads the record and examines each character looking for a non-space or a horizontal tab and keeps a count of each. The program finds 9 non-spaces and 0 tabs. It should find 10 non-spaces and 1 tab.

I must be overlooking something simple that's causing this result. Any other suggestions.

Thanks again for all the replies.

Wayne

3gm · Dec 31, 2002

Wayne -

I've not used Fujitsu in a long while. Is there, perhaps, an environment variable that causes TAB expansion to spaces?

W/o seeing your code, it's hard to make too many suggestions. One possible suggestion is to insert a display statement where you are testing for TAB. You might, for example, try:

Code:

DISPLAY INPUT-CHAR(IDX:1)
EVALUATE INPUT-CHAR(IDX:1)
    WHEN TAB-CHAR
        DISPLAY &quot;TAB  &quot;
    WHEN &quot;0&quot; THRU &quot;9&quot;
        DISPLAY &quot;NUMERIC&quot;
    WHEN OTHER
        DISPLAY &quot;OTHER&quot;
END-EVALUATE
[\CODE]

By adding other WHEN conditions, you should be able to figure out how your code is seeing the TAB (assuming that it is).

Regards,

Glenn

Dimandja · Dec 31, 2002

Wayne,

Also, it looks like your search loop is ending just before '09' (The program finds 9 non-spaces and 0 tabs). Your program should be finding at least 10 non-space characters:

1. Check the subscript in the loop.
2. Is your code interpreting '09' as EOF?
3. Does your READ bring back all the data for the record (does it get truncated)?
4. How many records can you successfully READ (without looking for TAB)?

Dimandja

waycon · Jan 1, 2003

Dimanja,

Thanks for sticking with me on this.

Here's the answer to your questions by way of a detailed explanation.

The TXT file I am using contains the following characters:
12345678>912345678>912345678>912345678>912345678>9 where > is a tab. That's a total of 50 characters including 5 tabs.

The program defines the record as a 20 character alphanumeric field. Counts are setup for the number of records, characters found, tabs found and spaces found.

The program found:
3 records. CORRECT
45 char. WRONG should be 50, the tabs were not counted.
0 tabs. WRONG there were 5.
15 Spaces. WRONG there were 10 spaces.

It looks like the tabs are being counted as spaces.

I've tried everyone's suggestion and the results are the same. I am looking for a clue in the environment variablkes as suggested by 3gm. Here's what the program currently looks like:

IDENTIFICATION DIVISION.
PROGRAM-ID. TESTPROG.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
*
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT DATA-FILE
ASSIGN TO INFILE
ORGANIZATION IS LINE SEQUENTIAL.
*
DATA DIVISION.
FILE SECTION.
FD DATA-FILE.
01 DATA-RECORD.
02 DATA-CHAR PIC X OCCURS 20 TIMES.
*
WORKING-STORAGE SECTION.
01 tab-char-grp.
05 tab-char-comp PIC 9(04) COMP VALUE 09.
05 tab-char-x REDEFINES tab-char-comp.
10 FILLER PIC X(01).
10 TAB-CHAR PIC X(01).

77 DATA-SUBSCRIPT PIC 9(4).
77 TAB-COUNT PIC 9(4) VALUE IS 0.
77 CHAR-COUNT PIC 9(6) VALUE IS 0.
77 SPACE-COUNT PIC 9(2) VALUE IS 0.
77 RECORD-COUNT PIC 9(2) VALUE IS 0.
*
PROCEDURE DIVISION.
OPEN INPUT DATA-FILE.
READ-FILE.
READ DATA-FILE AT END GO TO END-PROG.
ADD 1 TO RECORD-COUNT.
MOVE 0 TO DATA-SUBSCRIPT.
*
FIND-TAB-CHARACTER.
IF DATA-SUBSCRIPT = 20 GO TO READ-FILE.
ADD 1 TO DATA-SUBSCRIPT.
DISPLAY DATA-CHAR(DATA-SUBSCRIPT).
IF DATA-CHAR(DATA-SUBSCRIPT) NOT = SPACE ADD 1 TO CHAR-COUNT.
IF DATA-CHAR(DATA-SUBSCRIPT) = SPACE ADD 1 TO SPACE-COUNT.
IF DATA-CHAR(DATA-SUBSCRIPT) = TAB-CHAR ADD 1 TO TAB-COUNT.
GO TO FIND-TAB-CHARACTER.
*
END-PROG.
CLOSE DATA-FILE.
DISPLAY "RECORD COUNT", RECORD-COUNT.
DISPLAY "TAB-COUNT", TAB-COUNT.
DISPLAY "CHAR-COUNT", CHAR-COUNT.
DISPLAY "SPACE COUNT" SPACE-COUNT.
END PROGRAM TESTPROG.

Dimandja · Jan 1, 2003

Quick questions:

01 DATA-RECORD.
02 DATA-CHAR PIC X OCCURS 20 TIMES.
If your record contains 50 characters, why do you define only 20?

77 DATA-SUBSCRIPT PIC 9(4).
ADD 1 TO DATA-SUBSCRIPT.
This variable was never initialized, before its' use. Try:
77 DATA-SUBSCRIPT PIC 9(4) VALUE 0.

Dimandja

waycon · Jan 2, 2003

Dimandja.

The 20 character record size was just a experiment to see if the program would do three reads reads for a 50 character text file and indeed it does. I changed that to a 100 character record and initialized the DATA-SUBSCRIPT to zero. Tabs were still not found.

I now know what's going on, but don't understand why. Here's what I did. I modified the program to include an output record. Immediately after I read the input record of 100 characters I move it to to a 100 character output record and write it.

Looking at the output record with either Word (showing non-printable characters) or a hex viewer, it becomes clear that the 5 tab characters have been transformed into 3 spaces each. Fujitsu COBOL does the transformation when it reads the record.

I am totally convinced that there is some sort of environemt or runtime setting that controls this and that's path I am taking now.

Does that make sense to you?

Dimandja · Jan 2, 2003

Wayne,

It makes sense. Some compilers do give you the option to describe how to handle TAB characters found on INPUT or on OUTPUT.

If your compiler documentation does not address this issue, I can only assume that the vendor has already made those decisions for you - it happens.

However, LINE SEQUENTIAL files are expected to contain only printable characters and the record terminator. For example, ORGANIZATION IS LINE SEQUENTIAL will convert carriage return/line feed to SPACES - it must be doing the same with TAB: try ORGANIZATION IS SEQUENTIAL.

Dimandja

gphuber · Jan 2, 2003

Hi

Linew Sequential is defined by CR+LF ended not TAB. I have defined my line sequential file with variable length depending on a variable. Than i unstring the textstring into several fields wirh the delimiter X'09' and it works with several files. i use power cobol an windows 98 and win XP and it works.

Goetz

Dimandja · Jan 2, 2003

Hi Goetz,

LINE SEQUENTIAL translates CR+LF into SPACE, doesn't it? Upon READing a CRLF terminated record, the CRLF is changed to SPACES, I thought.

In your case, the TAB remains TAB - as it should. For Wayne, it seems the TAB becomes SPACES.

My reasoning is that since LINE SEQUENTIAL is prone to translations of special characters, maybe Wayne should try READing his file with a different ORGANIZATION.

Dimandja

gphuber · Jan 4, 2003

Hi Dimandja,

when i read an line.sequential file it is terminated with CR+LF. The compiler stops reading when it gets CR+LF. Example: you have an textfile with 80 Characters + CR+LF you get a length of 80 Characters. If yo will read the file with sequential organization your record must have the length of 82 characters, so i don't believe the compiler changes CR+LF in spaces. In my file section i have described:
FILE:
FD EINLESEDATEI is GLOBAL record is varying in size CHARACTERS DEPENDING ON KLEINSTELLENZAEHLER.
01 IMPORTSATZ.
05 IMPORTZEILE PIC X(32000).
WORKING:
01 KLEINSTELLENZAEHLER PIC 9(05).

When i have read the file i use an unstring command to seperate the fields with an delimiter Trennfeld. Trennfeld is defined:

01 TRENNFELD PIC X.

Wih an move statement i move x'09' to Trennfeld an than i can use it.

Goetz

gphuber · Jan 4, 2003

@ all,

i am sorry, but it will never work in Power cobol. The compiler replaces TAB characters by 4 or 8 Spaces. Ther compiler option TAB(4) or TAB(8) defines it all. The strating definition of the TAB Option is 4 Spaces to replace. I am sorry, but my old Programm i have used under dos has worked in other way.

Goetz

erba · Jan 13, 2003

Try reading the file as BINARY SEQUENTIAL. This will not translate the TAB character with spaces. When you find the CR+LF sequence, you know you get a new record.
hi

waycon · Jan 13, 2003

Erba,

That sounds like the way to go! I'll try it out and let you know. If it works it will save me the extra step of converting tabs to something else.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Defining The Tab Character As A Boolean Or Hexadecimal Literal

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Technical User

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor