Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Questions About COBOL And Its Data Types 11

Status
Not open for further replies.

Dimandja

Programmer
Apr 29, 2002
2,720
US
Questions about COBOL data types arise all too regularly on this forum. In fact, even seasoned COBOL programmers do get confused about some of the answers. I think this is the single most misunderstood area of COBOL, and the most controversial, and the most "improvised" by compiler designers. I think a FAQ (a thorough FAQ) is in order.

Another thing: are all these "exotic" data types helpful to the COBOL language, or are they mostly contributing to the isolation of COBOL from other computer languages?

Some COBOL compilers have been designed with data types that match the types used by most modern languages. For example, Nonstop COBOL programmers use the type NATIVE to duplicate the type INTEGER which is used by most other languages - eliminating puzzling questions about how much memory is used by data elements, and allowing data to be passed to and from COBOL with less headaches.

Isn't it about time that COBOL got rid of all these COMPs et al? Do those types really add anything to the language, or do they hurt it?

What do you think?

Remember, most other languages were largely designed by the programmers who use them. Must we leave the design of COBOL in the hands of committees and vendors?

Dimandja
 
The 2002 COBOL Standard introduces a number of "new" data types (USAGE) - including

Bit and Boolean
Floating-Point (Short, long, and extended)
"True" binary (no Picture - allocated by Usage)
Signed and unsigned (char, short, long, and double)

Although vendors may (when/if they implement the 2002 Standard) "equate" these with existing types (compare Packed-Decimal and COMP-3), they will become more "portable" when and if 2002 implementations become available.

P.S. Personally, coming from an IBM mainframe background, one of my favorites is Binary-Char which will FINALLY make it easy to deal with 1-byte binary fields in COBOL.

Bill Klein
 
Dimandja,
I have wondered for years why some people hold on to that "comp" stuff. As soon as bandwidths and storage got big enough that it was not a problem, I quit writing programs that used comp. God, the effort we used to go through to save 1K on the size of a file! And those horrible old editors...

Anyway, I'm with you. The new compilers need to at least make provision for the "new" data types. Most programmers I know would probably rather use Int, Long, Single, Double; etc, than to hose with little and big endian, comp-3, sign is leading separate character; etc., and all of the pain that goes into exchanging COBOL files with VB, C++; etc.

Maybe somebody like Microfocus could add some proprietary extensions to get the ball rolling. That's how about half of the features got into present-day COBOL.

Tranman
 
Even if you never use COMP(-x) in files, you should use it in programs for processing speed. One of the features of COBOL over other lanquages is that is is fast! At my last job, I wrote some COBOL programs to replace some VB programs. The VB programs took about 45 minutes to run. The COBOL PROGRAMS took about 5 seconds.

Size can still be a consideration for some files. Back in 1969 I worked on a system where one file had over 16 million records. Each byte saved in the record equated to over 16 megabytes saved in the file. There was a sequential file on tape that exceeded 16 reels. By compressing the data, it could be stored on three reels. At TRW Credit Data (before they became Experian) the credit history files had to be split into three files to avoid exceeding the maximum file size (about ten terabytes). Without compressing the data by using COMP and other methods it probably would have required about 50 files.
 
If you are using COMP-? fields to "micro-optimize" your application make certain that you understand exactly how your SPECIFIC compiler works with each "comp" usage.

For some, using a "simple" USAGE COMP will produce medium POOR performance while COMP-5 will work better. On some implementations COMP-3 works better than COMP or Binary for some but not all operations.

Features, compiler options, and/or directives can optimizer OR hurt performance for
- truncation (binary or packed fields)
- sign handling
- little/big-endian

etc.

If you ARE in a "highly" performance sensitive application (such as a nightly batch run of multiple hour programs or an online transaction requiring sub-second response time), the it becomes quite important to UNDERSTAND exactly how your compiler works with your operating system to maximize performance/speed.

Bill Klein
 
The COMP-3 field is very significant in the IBM Mainframe world when it comes to performance. There are machine instructions for performing calculations using values stored in the COMP-3 format. The compiler does not have to translate you numbers into another format prior to adding, subtracting, multiplying or dividing and then translate the results back. If you house your numbers in a format that best fits the platform you are running on, performance is generally inproved. If you are processing millions of records, that little bit of enhanced performance can mean a lot especially if you are paying for each processing cycle on a mainframe and many of us do.

Again, as mentioned above, reducing the size of you stored data also increases speed of your program by reducing the amount of data to be transfered to a hardward device or transmitted over a transmission line.

At my job we spend a significant amount of time finding ways to process the same data at a reduced cost. Many of our clients write these requirements into their contracts. When you are paying for the volumn transmitted, stored and processed, reducing the size of data can go a long way in reaching your contractual goals.

etom
 
On the IBM mainframes, COMP is much faster than COMP-3 up through 9 digits. After 9 digits, COMP fields are processed with a subroutine, which is very much slower.
 
The view that media costs having plummeted, saving a few bytes here and there is passe, so belly up to the bar and enjoy, is alluring. But, because of the lower cost, we're seeing an unprecedented spurt in consumption.

In another forum someone claims to have a file containing 400 billion records!!! That's right 400,000,000,000! And, no, I don't think it's a prank. He seems on the level.

Think of what a "few bytes here and there" will return in terms of $ in this situation. Even putting this extreme case aside, the "common sense" efforts toward conservation of datacenter resources, whatever they may be, should be part of the definition of a "good pgmr".

What does it take to put a sign in a PIC or use
COMP/COMP-3 where appropriate? Even remembering to code a COMP-3 field with an odd number of 9s is not asking too much of a "professional". When viewed from a single pgm's perspective these kinds of things look trivial, but in the aggregate, they can add up to big savings for the organizations we work for.

End of rant. :)

Regards, Jack.



 
Concerning the statement above,

"On the IBM mainframes, COMP is much faster than COMP-3 up through 9 digits. After 9 digits, COMP fields are processed with a subroutine, which is very much slower."

That is true when TRUNC(OPT) is used. If, however, TRUNC(BIN) (for example) is used,

The following is from the Enterprise COBOL V3R1 "performance tuning" paper,

"Comparing Data Types

When selecting your data types, it is important to understand the performance characteristics of them before
you use them. Shown below are some performance considerations of doing several ADDs and SUBTRACTs
on the various data types of the specified precision.

Performance considerations for comparing data types (using ARITH(COMPAT)):

Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(STD)
| using 1 to 9 digits: packed decimal is 30% to 60% slower than binary
| using 10 to 17 digits: packed decimal is 55% to 65% faster than binary
| using 18 digits: packed decimal is 74% faster than binary

Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(OPT)
| using 1 to 8 digits: packed decimal is 160% to 200% slower than binary
| using 9 digits: packed decimal is 60% slower than binary
| using 10 to 17 digits: packed decimal is 150% to 180% slower than binary
| using 18 digits: packed decimal is 74% faster than binary

Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(BIN) or COMP-5
| using 1 to 8 digits: packed decimal is 130% to 200% slower than binary
| using 9 digits: packed decimal is 85% slower than binary
| using 10 to 18 digits: packed decimal is 88% faster than binary

DISPLAY compared to packed decimal (COMP-3)
| using 1 to 6 digits: DISPLAY is 100% slower than packed decimal
| using 7 to 16 digits: DISPLAY is 40% to 70% slower than packed decimal
| using 17 to 18 digits: DISPLAY is 150% to 200% slower than packed decimal
| DISPLAY compared to binary (COMP or COMP-4) with TRUNC(STD)
| using 1 to 8 digits: DISPLAY is 150% slower than binary
| using 9 digits: DISPLAY is 125% slower than binary
| using 10 to 16 digits: DISPLAY is 20% faster than binary
| using 17 digits: DISPLAY is 8% slower than binary
| using 18 digits: DISPLAY is 25% faster than binary

DISPLAY compared to binary (COMP or COMP-4) with TRUNC(OPT)
| using 1 to 8 digits: DISPLAY is 350% slower than binary
| using 9 digits: DISPLAY is 225% slower than binary
| using 10 to 16 digits: DISPLAY is 380% slower than binary
| using 17 digits: DISPLAY is 580% slower than binary
| using 18 digits: DISPLAY is 35% faster than binary

DISPLAY compared to binary (COMP or COMP-4) with TRUNC(BIN) or COMP-5
| using 1 to 4 digits: DISPLAY is 400% to 440% slower than binary
| using 5 to 9 digits: DISPLAY is 240% to 280% slower than binary
| using 10 to 18 digits: DISPLAY is 70% to 80% faster than binary"

Bill Klein
 
Thank you to all who cared to comment on this subject.

But, let me bring the discussion back to the original point:

1. COMP fields serve COBOL well, but the interpretation of these fields is not universal among COBOL compilers.

2. Even if COBOL compilers agree on the meanings of COMP-x, the types are still confined to COBOL applications.

3. Unless you work in a COBOL only environment (a rarity nowadays), communication with non-COBOL applications - when using COMP-x - is difficult at best.

I am in sync with Tranman about the wisdom of introducing "universal" data types in COBOL. And I can't wait for The 2002 COBOL Standard mentioned by WMK.

Dimandja

 
Even other languages leave the interpretation of numeric data types up the the implementers. "Long" may be 4 or 8 bytes, "short" may be 1, 2 or 4 bytes. This is the same confusion as in COBOL. It is because of the history of the relationship between hardware and software. Early computers had different size words: 4 bits, 8 bits, 9 bits, 16 bits, 32 bits, 36 bits. I probably missed a few. Some early IBM machines used stricly decimal arithmatic. There were various implenmentations of floating-point fields. IBM's mainframe floating point format was originally designed by a FORTRAN compliler written at Whitwaters Rand Univerity in South Africa in the 1950's. Or maybe they got it from someone else. I think IBM invented the current format for packed-decimal, but other companies' computers copied it, and I suspect that some allowed packed-decimal fields withou signs.

In the intrests of run-time efficiency (speed and space) it is important to program for the target hardware. Small systems do not require run-time efficiency, but large or critical response systems do. But maintainablity must never be sacrificed for efficiency.
 
Jack -

Amen and catch a star!

<rant>
It has always irked me that anyone who has managed to throw together a few lines of code that happens to work (even sort of work) will call him/herself a programmer. Many of us have spent (and continue to spend) much time studying the &quot;art&quot; of programming. Paying attention to the sometime picayune details like choosing appropriate data-types helps us produce first-class code that meets our customers' needs and runs as reliably and efficiently as is reasonably possible. Most of these Johnny-come-latelys don't know or care about these issues as long as they can somehow get their code to work (at least this one time on this one platform and O/S).
</rant>

That said, having more &quot;universal&quot; datatype is a laudible goal, but one I don't believe we'll reach unless we can somehow remove the &quot;implementor defined&quot; parts of the language standard as well as vendor extensions. And I don't see that happening anytime soon - it's just not practical.

Glenn
 
Glenn,

I think what's impractical is removing the current datatypes. It is very attainable to add datatypes now missing from COBOL. I work with a COBOL compiler which supports both COMP-x and Integer type datatypes.

Dimandja
 
I forgot to mention that I am intimately familiar with COMP-x fields - having worked with COBOL for 25 years or so on several platforms, and having written COBOL pre-compilers.

This issue has to do with advancing COBOL, not degrading it.

Dimandja
 
I should have added that it is very difficult to add universal data types to COBOL if there are none. I have never seen any.
 
webrabbit,

This is an odd statement. Why do you think it's difficult?

I thought I mentioned the fact that I am working with one right now. You must have missed it. It's called &quot;Nonstop COBOL85&quot;. Also, as Bill mentioned: the 2002 COBOL Standard introduces a number of &quot;new&quot; data types.

Are you against this, webrabbit? If so, why?

Dimandja
 
I think the point was that as there are no
UNIVERSAL
datatypes OUTSIDE of COBOL, so it will be difficult to add something that doesn't exist *to* COBOL.

For example, the following is the definition of the new &quot;true&quot; binary USAGEs (and this pretty well &quot;corresponds&quot; to be understanding of the C definitions)

&quot;11) The usages binary-char, binary-short, binary-long, and binary-double define integer numeric data items that are held in a binary format suitable for the machine on which the runtime module is to run. The minimum range requirements for each of these usages are shown below. Any integer value of n within the specified range shall be expressible in the given usage. The implementor may allow a wider range. However, for signed items, any number that may be expressed as BINARY-CHAR shall also be expressible as BINARY-SHORT, any number that may be expressed as BINARY-SHORT shall also be expressible as BINARY-LONG, and any number that may be expressed as BINARY-LONG shall also be expressible as BINARY-DOUBLE.

The minimum ranges are:

BINARY-CHAR SIGNED -128 [-2**7] < n < 128 [2**7]
BINARY-CHAR UNSIGNED 0 ¡Ü n < 256 [2**8]
BINARY-SHORT SIGNED -32768 [-2**15] < n < 32768 [2**15]
BINARY-SHORT UNSIGNED 0 ¡Ü n < 65536 [2**16]
BINARY-LONG SIGNED -2147483648 [-2**31] < n < 2147483648 [2**31]
BINARY-LONG UNSIGNED 0 ¡Ü n < 4294967296 [2**32]
BINARY-DOUBLE SIGNED -2**63 < n < 2**63
BINARY-DOUBLE UNSIGNED 0 ¡Ü n < 2**64&quot;

***

Calling something like this &quot;universal&quot; is a stretch of the definition of that term.

Bill Klein
 
Thanks Bill, for the fine explanation of binary and machine depencies. Another thing, the interpretation of &quot;Long&quot; on different machines is not comparable to the interpretation of COMP-x between machines and between compilers.

As for the usage of the word universal, I made sure to use quotation marks around it in the appropriate context. This, of course, got lost in the translation. But, I'm sure we all know what was meant: &quot;common/trans-lingual datatypes&quot;. This sort of thing always slows down a good discussion - let's move on, please.

Dimandja
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top