Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Defining The Tab Character As A Boolean Or Hexadecimal Literal

Status
Not open for further replies.

waycon

Technical User
Sep 4, 2002
17
US
I have a need to examine the contents of a tab delimited text file.

To find the start and end of each field I need to look at each character in the string to determine if it's a tab character.

My question is in regard to how to define a tab character for comparison. I'm thinking that I can define the tab character using a hexadecimal or boolean literal, but can't find the values on any code charts.

Any suggestions?
 
The ASCII character known as 'Horizontal Tab' is generally considered as TAB. Hex = 09, Decimal = 9.

Dimandja
 
To use a TAB value in your program, you need something like:
Code:
  01  DATA-RECORD.
      02  DATA-CHAR PIC X OCCURS 2048 TIMES.
 
  01 tab-char-grp.
     05 tab-char-comp   PIC 9(04) COMP VALUE 09.
     05 tab-char-x REDEFINES tab-char-comp.
        10 FILLER       PIC X(01).
        10 TAB-CHAR     PIC X(01).

     IF DATA-CHAR(DATA-SUBSCRIPT) = TAB-CHAR 
        ADD 1 TO TAB-COUNT.
Dimandja
 
I prefer the SYMBOLIC phrase of SPECIAL-NAMES for this stuff:
Code:
SPECIAL-NAMES.
    SYMBOLIC TAB-CHAR 10
             ESC-CHAR 28
    .
[\code]
Note that the numbers are ordinals and, therefore, are one greater than their binary equivalent.  Also note that the values shown are ASCII dependent.  NB if you're using EBCDIC, a tab character is 6, not 10.

Glenn
 
I use alternative ; or TAB as an indicator to separate filds with the unstring command. I have defined one PIC X field cald Trennfeld and if i say it is delimited by TAB I move X"09" to it and and unstring the file with the delimiter Trennfeld. It works.

regards

Goetz
 
If you work with TABS, be aware of how you software/environment/editor interprets 'TAB'. It could make a big difference in how you code for it.

If one creates a file using 'TABs', the application may simply skip so many columns - instead of actually inserting the TAB character in the data.

Some manipulation of the data could implicitly replace TAB with SPACE - COPY/PASTE, utilities, etc...

For more details, see
Dimandja
 
Hello Dimandja,

i work in germany and use the tab by transferung data between several applications. in the target softwrae, a third party, it is defined that tab is hex 09 so i have never got problems. i thought this information could help.

Regards

Goetz
 
Thanks for all the suggestions. I've tried them all, but I am still having a problem identifying tabs.

My test file is a notepad txt file with the character string : 12345678>9 where ">" represents the horizontal tab.

A hex dump of the tax file shows: 0 31 32 33 34 35 36 37 38 09 39 eof. My program reads the record and examines each character looking for a non-space or a horizontal tab and keeps a count of each. The program finds 9 non-spaces and 0 tabs. It should find 10 non-spaces and 1 tab.

I must be overlooking something simple that's causing this result. Any other suggestions.

Thanks again for all the replies.

Wayne

 
Wayne -

I've not used Fujitsu in a long while. Is there, perhaps, an environment variable that causes TAB expansion to spaces?

W/o seeing your code, it's hard to make too many suggestions. One possible suggestion is to insert a display statement where you are testing for TAB. You might, for example, try:
Code:
DISPLAY INPUT-CHAR(IDX:1)
EVALUATE INPUT-CHAR(IDX:1)
    WHEN TAB-CHAR
        DISPLAY "TAB  "
    WHEN "0" THRU "9"
        DISPLAY "NUMERIC"
    WHEN OTHER
        DISPLAY "OTHER"
END-EVALUATE
[\CODE]

By adding other WHEN conditions, you should be able to figure out how your code is seeing the TAB (assuming that it is).

Regards,

Glenn
 
Wayne,

Also, it looks like your search loop is ending just before '09' (The program finds 9 non-spaces and 0 tabs). Your program should be finding at least 10 non-space characters:

1. Check the subscript in the loop.
2. Is your code interpreting '09' as EOF?
3. Does your READ bring back all the data for the record (does it get truncated)?
4. How many records can you successfully READ (without looking for TAB)?

Dimandja
 
Dimanja,

Thanks for sticking with me on this.

Here's the answer to your questions by way of a detailed explanation.

The TXT file I am using contains the following characters:
12345678>912345678>912345678>912345678>912345678>9 where > is a tab. That's a total of 50 characters including 5 tabs.

The program defines the record as a 20 character alphanumeric field. Counts are setup for the number of records, characters found, tabs found and spaces found.

The program found:
3 records. CORRECT
45 char. WRONG should be 50, the tabs were not counted.
0 tabs. WRONG there were 5.
15 Spaces. WRONG there were 10 spaces.

It looks like the tabs are being counted as spaces.

I've tried everyone's suggestion and the results are the same. I am looking for a clue in the environment variablkes as suggested by 3gm. Here's what the program currently looks like:

IDENTIFICATION DIVISION.
PROGRAM-ID. TESTPROG.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
*
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT DATA-FILE
ASSIGN TO INFILE
ORGANIZATION IS LINE SEQUENTIAL.
*
DATA DIVISION.
FILE SECTION.
FD DATA-FILE.
01 DATA-RECORD.
02 DATA-CHAR PIC X OCCURS 20 TIMES.
*
WORKING-STORAGE SECTION.
01 tab-char-grp.
05 tab-char-comp PIC 9(04) COMP VALUE 09.
05 tab-char-x REDEFINES tab-char-comp.
10 FILLER PIC X(01).
10 TAB-CHAR PIC X(01).

77 DATA-SUBSCRIPT PIC 9(4).
77 TAB-COUNT PIC 9(4) VALUE IS 0.
77 CHAR-COUNT PIC 9(6) VALUE IS 0.
77 SPACE-COUNT PIC 9(2) VALUE IS 0.
77 RECORD-COUNT PIC 9(2) VALUE IS 0.
*
PROCEDURE DIVISION.
OPEN INPUT DATA-FILE.
READ-FILE.
READ DATA-FILE AT END GO TO END-PROG.
ADD 1 TO RECORD-COUNT.
MOVE 0 TO DATA-SUBSCRIPT.
*
FIND-TAB-CHARACTER.
IF DATA-SUBSCRIPT = 20 GO TO READ-FILE.
ADD 1 TO DATA-SUBSCRIPT.
DISPLAY DATA-CHAR(DATA-SUBSCRIPT).
IF DATA-CHAR(DATA-SUBSCRIPT) NOT = SPACE ADD 1 TO CHAR-COUNT.
IF DATA-CHAR(DATA-SUBSCRIPT) = SPACE ADD 1 TO SPACE-COUNT.
IF DATA-CHAR(DATA-SUBSCRIPT) = TAB-CHAR ADD 1 TO TAB-COUNT.
GO TO FIND-TAB-CHARACTER.
*
END-PROG.
CLOSE DATA-FILE.
DISPLAY "RECORD COUNT", RECORD-COUNT.
DISPLAY "TAB-COUNT", TAB-COUNT.
DISPLAY "CHAR-COUNT", CHAR-COUNT.
DISPLAY "SPACE COUNT" SPACE-COUNT.
END PROGRAM TESTPROG.



 
Quick questions:

01 DATA-RECORD.
02 DATA-CHAR PIC X OCCURS 20 TIMES.
If your record contains 50 characters, why do you define only 20?

77 DATA-SUBSCRIPT PIC 9(4).
ADD 1 TO DATA-SUBSCRIPT.
This variable was never initialized, before its' use. Try:
77 DATA-SUBSCRIPT PIC 9(4) VALUE 0.

Dimandja


 
Dimandja.

The 20 character record size was just a experiment to see if the program would do three reads reads for a 50 character text file and indeed it does. I changed that to a 100 character record and initialized the DATA-SUBSCRIPT to zero. Tabs were still not found.

I now know what's going on, but don't understand why. Here's what I did. I modified the program to include an output record. Immediately after I read the input record of 100 characters I move it to to a 100 character output record and write it.

Looking at the output record with either Word (showing non-printable characters) or a hex viewer, it becomes clear that the 5 tab characters have been transformed into 3 spaces each. Fujitsu COBOL does the transformation when it reads the record.

I am totally convinced that there is some sort of environemt or runtime setting that controls this and that's path I am taking now.

Does that make sense to you?
 
Wayne,

It makes sense. Some compilers do give you the option to describe how to handle TAB characters found on INPUT or on OUTPUT.

If your compiler documentation does not address this issue, I can only assume that the vendor has already made those decisions for you - it happens.

However, LINE SEQUENTIAL files are expected to contain only printable characters and the record terminator. For example, ORGANIZATION IS LINE SEQUENTIAL will convert carriage return/line feed to SPACES - it must be doing the same with TAB: try ORGANIZATION IS SEQUENTIAL.

Dimandja
 
Hi

Linew Sequential is defined by CR+LF ended not TAB. I have defined my line sequential file with variable length depending on a variable. Than i unstring the textstring into several fields wirh the delimiter X'09' and it works with several files. i use power cobol an windows 98 and win XP and it works.

Goetz
 
Hi Goetz,

LINE SEQUENTIAL translates CR+LF into SPACE, doesn't it? Upon READing a CRLF terminated record, the CRLF is changed to SPACES, I thought.

In your case, the TAB remains TAB - as it should. For Wayne, it seems the TAB becomes SPACES.

My reasoning is that since LINE SEQUENTIAL is prone to translations of special characters, maybe Wayne should try READing his file with a different ORGANIZATION.

Dimandja
 
Hi Dimandja,

when i read an line.sequential file it is terminated with CR+LF. The compiler stops reading when it gets CR+LF. Example: you have an textfile with 80 Characters + CR+LF you get a length of 80 Characters. If yo will read the file with sequential organization your record must have the length of 82 characters, so i don't believe the compiler changes CR+LF in spaces. In my file section i have described:
FILE:
FD EINLESEDATEI is GLOBAL record is varying in size CHARACTERS DEPENDING ON KLEINSTELLENZAEHLER.
01 IMPORTSATZ.
05 IMPORTZEILE PIC X(32000).
WORKING:
01 KLEINSTELLENZAEHLER PIC 9(05).

When i have read the file i use an unstring command to seperate the fields with an delimiter Trennfeld. Trennfeld is defined:

01 TRENNFELD PIC X.

Wih an move statement i move x'09' to Trennfeld an than i can use it.

Goetz
 
@ all,

i am sorry, but it will never work in Power cobol. The compiler replaces TAB characters by 4 or 8 Spaces. Ther compiler option TAB(4) or TAB(8) defines it all. The strating definition of the TAB Option is 4 Spaces to replace. I am sorry, but my old Programm i have used under dos has worked in other way.

Goetz
 
Try reading the file as BINARY SEQUENTIAL. This will not translate the TAB character with spaces. When you find the CR+LF sequence, you know you get a new record.
hi
 
Erba,

That sounds like the way to go! I'll try it out and let you know. If it works it will save me the extra step of converting tabs to something else.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top