Signed COMP-3 fields in COBOL 4

bigironer · Nov 11, 2005

Everything I've ever read or heard has said that a COMP-3 field used in arithmatic should be signed as otherwise
there would be an extra instruction generated to force the sign to x'0F'. I have run several tests however showed that an unsigned field was more efficient. I generated the Assembler code to see what was going on and here is what I found. With the unsigned field there is indeed an extra "OI 14(8),X'0F'" generated to force the sign to "F" but with the signed "receiving" I am getting an extra "ZAP 10(5,8),10(5,8) BUCKET BUCKET" generated which is taking much longer than the "OI" is.

What's going on? This is with IBM Enterprise COBOL on a mainframe.

ceh4702 · Nov 12, 2005

Who cares what the compiler does!

How fast is one way compared to the other way when accessing 1,000,000 records?

Also if you are doing computations what happens if you get a negative value? Do the values become corrupt when nothing is signed?

Who has time to look at what the compiler is actually doing?

This may have something to do with compiler options also. Different compilers manipulate numbers using different methods.

Is there a difference between Add and subtract, Evaluate, and compute? Using different verbs may result in different results. Compute for instance has the "on overflow" option and may be doing extra work and take longer to run. There may be distinct differences in the different verbs you may be using.

If you do not like my post feel free to point out your opinion or my errors.

dneufarth · Nov 12, 2005

Exactly how much longer is the ZAP taking compared to the OI? Why bother to find out? How much does it cost? Why is it important? Are you on a project to cut computer costs?

If you want performance, code in Assembler. COBOL addresses business problems.

My two cents worth: your time is much more expensive than ZAP's v. OI's. Code it, test it and move on to the next business problem. The life of an application programmer is to solve business problems.

Dave

3gm · Nov 12, 2005

bigironer -

ZAP (Zero and Add Packed) zeroes the receiving field and moves the sending field into it guaranteeing an appropriate decimal result with operational sign (or appropriate exception). W/o seeing your code, it's hard to know why the compiler generated this specific ZAP instruction.

As for the other folks' comments:

1. I don't advocated going through existing code adding signs wherever. I do think that some effort expended learning how to write code that performs well is a worthy endeavor. You basically amortize a small investment in time spent figuring these things out over many years of using what you've learned. You may also do some myth-busting. Sounds like in this case you might be doing that vis-a-vis signed comp-3 fields (at least under certain circumstances).

The tools and techniques you use to do this sort of learning can also be very useful in production downs etc. I still find myself, even in these days of automated tools, having to read a dump or decode some instructions to figure out the root of a problem.

2. What optimizes your code at your shop MAY not optimize the same code elsewhere, so don't blindly assume that what you learn this way is universally true. Certainly different architectures (e.g. big endian vs little endian) make a big difference, but so do compiler option switches and even processor model.

3. Anyone ever hear of or complain about software bloat? Ever gripe that your system is sluggish or slow to come up? I'm convinced that too many times we (software developers) spend most of our time just getting it working. Rarely do we take the time (or are given the time in many cases) to "do it right". We save a few hours of programmer time at the expense of what may turn out to be many hours of user time waiting for our slow/sluggish programs to run when a little algorithm tuning or a better algorithm choice would have resulted in great cost savings in the long run. I'm not suggesting that use of unsigned comp-3 fields is in this league, but I think the discussion is symptomatic of the larger issue.

Regards.

Glenn

CobolKid · Nov 13, 2005

Right on Glenn!!

Steve

COBOLKenny · Nov 14, 2005

bigironer -

Here's an explanation of what is happening. When we were converting from VS COBOL to COBOL II/COBOL 370, I found the book "COBOL/370 FOR VS COBOL AND COBOL II PROGRAMMERS by Harvey Bookman. Here's what he says about your question....

"One change from VS COBOL to COBOL/370 that may seem quite baffling occurs when a constant or another COMP-3 data field is added or subtracted to or from a signed COMP-3 data field. The VS COBOL compiler produced only an Add Packed (AP) or Subtract Packed (SP) instruction. COBOL/370 still produces the same instruction but then issues a Zero and Add Packed (ZAP) of the resulting field into itself. The ZAP is executed for a very subtle reason. When two negative numbers are added together and an overflow occurs, the overflow flag in the condition code is turned on while the resulting sign is set as if overflow did not occur. This means that if two numbers each defined as PIC S9(03) COMP-3 were added together, and they had values of -1 and -999 before the addition, the resulting field would be zero (the digit that overflowed was truncated) with its sign negative. The ZAP will preserve the sign of all other computations but will change a zero with a negative sign to a zero with a positive sign."

"Ensuring a positive sign in a zero result was not necessary in the VS COBOL compiler. This is because the VS COBOL compiler used a Compare Packed (CP) instruction to compare numeric fields. It is interesting that COBOL/370 now often produces a Compare Logical (CLC) instruction to compare numeric fields. While this is a faster instruction than CP, it requires the same sign value for an equal condition to occur. For example, a CP instruction will find fields with hexadecimal values '0F' and '0C' equal, while the CLC will not."

The same instructions are generated in Enterprise COBOL as were for COBOL/370.

Hope this explains the background. I'd still say use signed fields to ensure that you get the right result and not worry about the efficiency lost.

Crox · Nov 14, 2005

If your find COB2JOB on the internet, you find the way to determine the exact amount of used CPU for the current job. So you take the time, do your research by performing 10000 times method 1, take the time again, perform 10000 times method 2, etc. etc. C0B2JOB is made by Gilbert Saint-flour. It can be that you have to update the MVS-addresses. You can find them in some manuals with a description of MVSDATA.

Have fun!

Regards,

Crox

Crox · Nov 14, 2005

EXAMPLE of how to determin the cpu time based on Gilbert's approach:

Code:

       CBL TRUNC(OPT)
       CBL NOSSRANGE
       IDENTIFICATION DIVISION.
       PROGRAM-ID. CPUTIME.
       AUTHOR. CROX.
       ENVIRONMENT DIVISION.
       CONFIGURATION SECTION.
       SPECIAL-NAMES.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 CPU-TIME-DEF-WORKING-STORAGE.
          03 MICRO-SECONDS                 PIC S9(15) COMP-3.
          03 FW1                           PIC S9(08) COMP.
          03 PTR4      REDEFINES FW1 POINTER.
          03 FW2                           PIC S9(08) COMP.
          03 PTR5      REDEFINES FW2 POINTER.
          03 FW2-ALFA  REDEFINES FW2       PIC  X(04).
          03 BFW2                          PIC S9(18) COMP VALUE ZERO.
          03 BFW2-ALFA REDEFINES BFW2      PIC  X(08).
       LINKAGE SECTION.
       01 CB1.
          03 PTR1 POINTER OCCURS 256.
       PROCEDURE DIVISION.
       MAIN SECTION.
       0000.
           PERFORM GET-MICRO-SECONDS.
       0000.
           GOBACK.

       GET-MICRO-SECONDS SECTION.
       GMS-01.
           SET ADDRESS OF CB1    TO NULL.
           SET ADDRESS OF CB1    TO PTR1(136).
           SET PTR4              TO PTR1(80).
           SET PTR5              TO PTR1(81).
           MOVE FW2-ALFA         TO BFW2-ALFA (5:).
           COMPUTE MICRO-SECONDS  = FW1 * 1048576 +
                                    BFW2 / 4096.
           DISPLAY 'MICRO-SECONDS = ' MICRO-SECONDS.
       GMS-99.
           EXIT.

ceh4702 · Nov 15, 2005

what about evaluate?
what about not using compute?
What happens when you break down the steps of the computation?
Is there a difference between using stored values and literals?
I was at one site that claimed if you use COMP-5 and forced pure binary computations it was faster.

I have seen instatnces where COMP AND COMP-3 Both were used depending on the size of the variable. i.e. using a binary for and even number like a 4 digit or 8 digit number and using COMP-3 for odd numbered variables like "PIC 9(05)V99 COMP-3". In days gone by storage cost more than processing power, so the reasoning was sometimes different. As a rule of thumb, I never use an even number of digits so I dont waste space, however, if I dont use the extra digit it is wasted anyway, but I cant stand that the compiler cant use that extra nipple.

No one ever bothered to bring up the fact that a sign makes a computation faster when I was in school, but I find it interesting. Since I dont have a huge file system to experiment with, it can be difficult to determine a need for such a thing. I wonder how databases use techniques to make them run faster. We are in the middle of a switch from COBOL on the Mainfram (Emulated) to an IBM Database system that is not even Relational (Unidata). It seems kind of convoluted. The math used in a file system for a school is not that fancy. hrs * Rate is not that complicated to figure out. We are not doing calculus. So maybe if I make Linecount a signed comp-3 or a binary it will run faster. Probably not worth messing with in the long run. Maybe Accounting Programs would be faster, but we already use signed fields there. It is a real pain to display a comp-3 field to look at it.

I find this an interesting subject. We do this procedure which I find to be a pain in the neck. We take the last four digits and put it in the front of the social security number and then take that result and put it in a comp-3 field and then move that to a pic x(05). It is suppose to make the id (SSN) SPREAD OUT OVER THE STORAGE SPACE MORE EVENLY AND MAKE IT EASIER TO FIND THE RIGHT NUMBER USING RANDOM ACCESS. You see a SSN often starts with 12345 more often than other numbers, so they bunch up and dont sort well. It has been done this way a long time so it is just a learned technique. File access may take longer than a computation, so what takes the most time and what is more important? In our new system we just use numbers for person ID's so that is completely different.

If you do not like my post feel free to point out your opinion or my errors.

k5tm · Nov 16, 2005

ceh4702 said:
So maybe if I make Linecount a signed comp-3 or a binary it will run faster.

Will it ever, ever regain the cost of the recompilation?

Now, regarding the SSN 'algorithm', since the final product is X(5), you aren't using the result as a direct access (eg for a relative key) so I think in most modern indexed file systems, you are plainly wasting CPU cycles to optimize something that disappeared a long time ago.

Tom Morrison

http://www.liant.com

3gm · Nov 16, 2005

Tom -

Part of my point above was that one should develop good habits so that code you write is basically well-performing at the level we're discussing. For example, I routinely define a line counter as a COMP/COMP-5/BINARY value (depending on what my compiler supports). I don't know of any situations where that's a BAD thing to do and it's probably better than DISPLAY or COMP-3 in most situations.

As for the SSN algorithm, most analyses of data structures and algorithms assume that keys are equally likely to occur. In the case of SSNs, that's definitely not true. SSNs are issued in local blocks (e.g. 239-xx-xxxx through 24?-xx-xxxx were issued in N Carolina; 004-xx-xxxx were issued in Maine). Switching the SSN around MAY improve the distribution of keys and enable some indexed file systems to handle inserts/retrievals more efficiently depending on the data structures and algorithms they use. Of course, inverting the key may not improve things at all.

Many hospital medical records departments do a version of key inversion with patient numbers in a process called terminal digit filing (e.g. reorder 12345678 to 78563412). It means insertions are more evenly distributed and you don't have to move records around from cabinet to cabinet or shelf to shelf as often.

All of that said, YMMV and one has to exercise judgment when designing systems. In most cases, inverting the key is likely more trouble than it's worth. However, in SOME cases it may be absolutely necessary in order to get the performance the application requires.

Regards.

Glenn

k5tm · Nov 16, 2005

Glenn,

Yes, I understand what the algorithm is doing - even why it is doing it. I am that old! My point is, knowing the inside of a couple indexed file systems, the need for this type of arcania simply disappeared a couple of decades ago (unless of course the computer record is tied physically to a filing cabinet...).

Tom Morrison

http://www.liant.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Signed COMP-3 fields in COBOL 4

bigironer

Programmer

ceh4702

Programmer

dneufarth

MIS

3gm

Programmer

CobolKid

Programmer

COBOLKenny

Programmer

Crox

Programmer

Crox

Programmer

ceh4702

Programmer

k5tm

Programmer

3gm

Programmer

k5tm

Programmer

Similar threads

Part and Inventory Search

Sponsor