Code challenge! :-)

wahlers · Jun 8, 2004

PROGRAM CHALLENGE:

Can somebody shorten the following algorithm?
(source program included)

Note: with the emphasis on 'shorten' or 'shorter' or 'less complex'.
I already know you can do this in 10,000 different ways.
I just want a smarter algorithm (in COBOL-85 of course!).

Restriction:
Only COBOL-85 allowed.
No tricky vendor supplied extensions allowed.

Following is the source code:
(algorithm is explained within this source code)

###############################################

IDENTIFICATION DIVISION.
PROGRAM-ID. getProductCode.
*/**
* ---> Interface contract:
*
* getProductCode will return a valid product code
* when and only when a valid product string is
* passed to this routine.
* Also, the validityIndicator is set to 'codeValid'
* when and only when a valid product code is returned.
* (n.b. a valid returned product code is padded with
* leading zeros).
*
* In all other events the validityIndicator is set to
* 'codeInvalid' and the contents of the returned product
* code field is undetermined.
*
* ---> Call structure:
*
* CALL getProductCode USING
* , passProductString X(50) INPUT (free format)
* , returnProductCode 9(18) OUTPUT
* , validityIndicator X(01) OUTPUT (valid = '0')
* END-CALL
*
* ---> valid product codes are (by example):
*
* '1-23-4505-4-321' <--- any number separated
* ' 123-450543-21 ' by 1 or more single(!)
* '12345-0-54321 ' hyphens,
* ' 12345-0-54321' with or without leading
* or trailing spaces. Or:
* ' 12345054321 ' <--- numeric digits only
* '12345054321 ' with or without leading
* ' 12345054321' or trailing spaces
*
* N.B. leading zeros are allowed but leading spaces
* are not allowed after leading zeros (by example):
*
* ' 00012345054321' <--- valid!
* '0012345-0-54321' <--- valid!
* '1 ' <--- valid! single digit number!
* ' 2 ' <--- valid! single digit number!
* ' 3' <--- valid! single digit number!
* ' 004 ' <--- valid! single digit number (=4)!
*
* ---> invalid product codes are (by example):
*
* '00 12345054321' <--- invalid! Spaces after leading zeros!
* ' 12345—-054321 ' <--- invalid! Consecutive hyphens!
* ' 12345054321- ' <--- invalid! Trailing hyphen!
* ' –12345054321 ' <--- invalid! Leading hyphen!
* ' 12345X54321 ' <--- invalid! Invalid character!
*
* N.B. all production codes are greater than zero
* therefore: productCode = ZERO is invalid!
*
**/

DATA DIVISION.
WORKING-STORAGE SECTION.
01 workFields.
05 firstPosition PIC 9(03) BINARY.
05 lastPosition PIC 9(03) BINARY.
05 sourcePosition PIC 9(03) BINARY.
05 targetPosition PIC 9(03) BINARY.

LINKAGE SECTION.

*// IN: User supplied product string
01 passProductString.
05 productString PIC X(50).

*// OUT: product code number
01 returnProductCode.
05 productCode PIC 9(18).

*// OUT: validityIndicator
01 validityIndicator PIC X(01).
88 codeValid VALUE 0.
88 codeInvalid VALUE 1.

PROCEDURE DIVISION USING passProductString
returnProductCode
validityIndicator
.
PERFORM getProductCode
GOBACK
.

getProductCode.
*-----------------------------------------------------
*
* Extract the product code as numeric value
* with mandatory test on validity!
*
*-----------------------------------------------------

*// Position at first non space position
*// (could be done with inspect statement but loop used here!)
PERFORM WITH TEST BEFORE
, VARYING firstPosition
, FROM 1
, BY 1
, UNTIL (NOT productString(firstPosition:1) IS SPACE)
, OR firstPosition = LENGTH OF productString
END-PERFORM

*// Position at first non leading zero position
PERFORM WITH TEST BEFORE
, VARYING firstPosition
, FROM firstPosition
, BY 1
, UNTIL (NOT productString(firstPosition:1) IS ZERO)
, OR firstPosition = LENGTH OF productString
END-PERFORM

*// Determine the last valid productString position.
*// only trailing spaces are allowed!
PERFORM WITH TEST BEFORE
, VARYING lastPosition
, FROM LENGTH OF productString
, BY -1
, UNTIL (NOT productString(lastPosition:1) IS SPACE)
, OR lastPosition = 1
END-PERFORM

IF productString(firstPosition:1) IS NUMERIC AND
, productString(lastPosition :1) IS NUMERIC AND
, productString(firstPosition:1) > ZERO
,
*// initially assume the code to be extracted is valid
*// and initialise target field (=productCode) and
*// position the pointer at the end of the target field
, SET codeValid TO TRUE
, MOVE LENGTH OF productCode TO targetPosition
, MOVE ZEROS TO productCode
,
ELSE
*// Invalid product code because product code must
*// at least begin with a digit greater zero (1 thru 9)
*// and at least end with a digit (0 thru 9).
, SET codeInvalid TO TRUE
,
END-IF

PERFORM WITH TEST BEFORE
, VARYING sourcePosition
, FROM lastPosition
, BY -1
, UNTIL sourcePosition < firstPosition
, OR codeInvalid
,
, EVALUATE TRUE
, ,
*// copy the digit when the source character is a digit
, , WHEN productString(sourcePosition:1) IS NUMERIC
, , IF targetPosition > ZERO
, , , MOVE productString(sourcePosition:1)
, , , TO productCode (targetPosition:1)
, , , SUBTRACT 1 FROM targetPosition
, , ELSE
*// numeric code in source (productString) exceeds
*// maximum value possible in target (productCode)
*// result: not all digits copied, therefore invalid!
*// (Option: code invalid or define a larger target field)
, , , SET codeInvalid TO TRUE
, , END-IF
, ,
*// ignore hyphen between two digits.
, , WHEN productString(sourcePosition :1) IS '-'
, , AND productString(sourcePosition - 1:1) IS NUMERIC
, , AND productString(sourcePosition + 1:1) IS NUMERIC
, , CONTINUE
, ,
*// Invalid character in input, therefore:
*// the whole product code is invalid!
, , WHEN OTHER
, , SET codeInvalid TO TRUE
, ,
, END-EVALUATE
END-PERFORM
.

END PROGRAM. getProductCode.

deveshjogal · Jun 9, 2004

Hi

Why not try using all validation in single loop
u will have to put a few If conditions to trap all invalid cases

I just build one but u may want to simplify the IF structure if sounds complicated.(I dont feel so as I had worked with far more complex structures in 74 ). offcourse I didnt test it. I have not defined all variables.

Code:

01 input-string x(50).
01 input-string-red redefines input-string.
   05 cur-pos pic x(01) occurs 50 times.

01 valid-return-code pic 9(18).
01 valid-return-code-x pic x(18) redefines valid-return-code.

01 target-return-value.
   05 target pic X(01) occurs 50 times.

01 valid-number pic x(01).
   88 isnumber values '0' thru '9'.

01 x pic 9(02) value 0.
01 I pic 9(02) value 0.
01 max-string-length 9(02) value 50.
01 max-digit-can-accomodate 9(02) value 18.

====================================================
define all flags here...

other-than-space-char-encounter
other-than-space-char-not-encounter


is-leading-zero
is-not-leading-zero

these-r-not-traling-spaces
these-r-trailing-spaces


last-char-not-hypen
last-char-hypen


dont-stop-process
stop-process
=====================================================
procedure division. 

200-validation-para.
====================
set other-than-space-char-not-encounter to true
set these-r-not-traling-spaces to true
set dont-stop-process to true
set last-char-not-hypen to true
set is-leading-zero to true
move 0 to I.

[COLOR=red]
perform until x > max-string-length 
or stop-process


if curpos(x) <> space
   if these-r-trailing-spaces
      set stop-process to true
   else          
      if curpos(x) = '-' 
*check if it not the first one or a consecutive one
         if  last-char-not-hypen and i > 0
            set last-char-hypen to true
         else          
            set stop-process to true
         end-if
      else 
         if curpos(x) = isnumber
            set last-char-not-hypen to true
            if  curpos(x) not= '0'
               set is-not-leading-zero to true   
               target(i+1) = curpos(x)
               compute i = i + 1
            else 
*check if it is leading zero
               if curpos(x) = '0' and is-not-leading-zero
                  target(i+1) = curpos(x)
                  compute i = i + 1
               end-if   
            end-if 
         else
*if it is not space,not number or not hypen then invalid
* add everything valid before this else
            set stop-process to true
         end-if 
     end-if
   end-if
*this flag will indicate there are no more leading spaces
   set other-than-space-char-encounter to true         
else
*since we had already encounter char, any spaces encounterd *are trailing space
   if other-than-space-char-encounter 
      set these-r-trailing-spaces to true         
   end-if   
end-if

*check if number extracted > max length of receiving str
if i > max-digit-can-accomodate or x = max-string-length
   set-stop-processs to true
end-if
end-perform.[/color]


if dont-stop-process
   compute x = i + 1
   move all '0' to  valid-return-code-x
  move target-return-value(1: i) to valid-return-code-x(x: )
else
   process for invalid
end-if

wahlers · Jun 10, 2004

To DEVESHJOGAL:

Thanks for your time and effort, I greatly appreciate it.

You said: "...I had worked with far more complex structures..."
reply: So have I. But that was not the intention of the excercise. Actually...I intentionally used a simple validation function.

You said: " define all flags here..."
reply: My code only uses one flag. This flag is mandatory and returned to the caller with only one boolean meaning: 'the product code is either good or bad'.
You are using multiple flags for multiple conditions. Mutiple flags increases the (mental) complexity. The challenge was to reduce the complexity.

To me it seems that your code ends when the first space is encountered after the product code (however it could be that I interpret the logic wrongly!).
But the following product code is also incorrect because of the '6' following after '12345' and the space:
' 12345 6 '

My code only has 2 IF statements.
Neither of them is nested with other IF statements, but(!) the second IF statement is nested within a simple EVALUATE statement.
Thus I only have one level-2 nested condition construction(*).
Neither of the IF statements has any logic in the ELSE (other then setting the return condition to 'code invalid').
Both IF statements have simple condition logic (eventhough the first IF statement has 3 conditions).
Both IF statements can be mentally evaluated at a glance (respectively 9 and 7 executable coding lines).
And both IF statements can immediately mentally(!) be discarded after a mental evaluation.

(*) I mean I only have a level-2 nested condition construction on paper!.
The EVALUATE statement itself is, at compile time, 'translated' into nested IF logic (see compile output).

Of course the EVALUATE statement used is another way of coding a complex if statement.
However, this EVALUATE statement only contains one simple nested IF statement.
This whole EVALUATE statement, including comments, fits on one screen (so no mental jumps required).

In contrast:
Your logic contains deeply nested IF statements controlled by multiple flags. I tend to say that your logic is more complex than the original program.

Having said this:
Your following ideas may be a good alternative:
1. Use tables instead of substrings.
(however I consider this a matter of preferrence. So, all considering I do not think this reduces the complexity).
2. First building the (numeric) product code into an alphanumeric string. And, at the end, move the value to the numeric return field.
(In the original logic the product code number must be built using a right-to-left logic, which is the opposite of the normal left-to-right logic. This may actually decrease the complexity(*)).

(*) Actually there was a reason for the right-to-left logic!
This had to do with the COBOL numeric limitation of 9(18). It was possible that a product number could grow passed that limitation and defintions like PIC 9(20) were not possible.
For any bigger definitions the (interface field) would simply have been changed from PIC 9(18) to PIC X(20) without changing the logic!.
Of course it is known that this has consequeces for all calling programs. However, the definition(s) are kept in copybooks. And, since the logic is not changed, only a recompile of the programs involved is needed (and testing of course!).
Originally it was also an option to define this field as PIC X(20) to start with. But this did not solve the problem! Because it could not be excluded that future product codes are even bigger than that.
In the end it was a compromise to use the biggest numeric field definition available and code a robust algorithm that can cope with future changes (see also the usage of the 'LENGTH OF' phrase).

With the COBOL-2000 standards it is now possible to define an integer up to PIC 9(31).
Micro-Focus can use up to PIC 9(38). However, this is a vendor extension and therefore I discourage its usage.

3gm · Jun 10, 2004

wahlers -

Your algorithm is straight-forward and fairly easy to follow. I doubt it can be made logically simpler. One might argue that it could be speeded up (e.g. IF NUMERIC on a single byte field might be faster if you check >= 0 and <=9).

In the interest of removing unnecessary clutter, the WITH TEST BEFORE would appear to be unnecessary on those PERFORMs that have no body.

I did find that it doesn't seem to produce the correct result for product numbers that begin with leading zeros and a hyphen (e.g. 0-0100). Perhaps you've simply not given us the full definition of a valid product number

Just for fun, I implemented this on my HP-3000. You suggested "No tricky vendor supplied extensions allowed". Well, you seem to have used some extensions, albeit fairly innocuous ones.

E.g.:

1. 88-levels codeValid and codeInvalid must be alphanumeric values, not numeric.

2. LENGTH OF is, I believe for COBOL 85, a vendor extension although a widely implemented and very useful one.

3. PROCEDURE DIVISION must begin with a paragraph name before the first sentence.

4. The body of each PERFORM must contain at least one imperative statement (e.g. CONTINUE).

5. IS SPACE, IS ZERO, and IS '-' should be EQUAL SPACE, EQUAL ZERO, and EQUAL '-'.

6. END PROGRAM must be followed directly by the program name, not a period.

Nitpicking certainly, but useful to be aware of if you're going to write portable code.

Regards.

Glenn

wahlers · Jun 10, 2004

Reply to: Doods

His answer can be viewed at forum thread:

http://mvshelp.com/cgi-bin/ubb6/ultimatebb.cgi?ubb=get_topic;f=1;t=003239

Thanks for your effort and time.

Here are some of my comments:

Nice use of INSPECT to find consecutive hyphens!
However, the inspect statements as used are resource intensive. In general not a problem, but might be when extracting millions of product codes.

The product string in the passProductString field does not have to start at position 1. As far as I can judge your code you assume it starts at position 1.

The 'JUSTIFIED RIGHT' clause in the receiving field does not help. It only means that the receiving right justified alphanumeric field will be left truncated (depending on the contents and size of the sending field).
I don't know why you used this field (Assume it is my lack of knowledge. As far as I know I never coded this myself, though I did encounter it sometimes while maintaining other programs).

You used an alphanumeric field to built the number and thus avoiding the right-to-left logic as used in the original program. I already explained in the previous reply on this thread why I used the right-to-left logic.
You used this alphanumeric field, test it to be numeric and then move it to the receiving numeric field.
However, the original code is more flexibel when the returnProductCode field has to be increased beyond the numeric limitations restricted by COBOL (as explained in the previous reply).

Conclusion:

It looks to me that the original and your alternative code are more or less of the same complexity
(ignoring the logic error of not starting at the start position).

Regards, Wim Ahlers.

k5tm · Jun 10, 2004

wahlers said:
However, the inspect statements as used are resource intensive.

I somehow doubt that INSPECT is more resource intensive than the original looping constructs. Could you be more specific about the platform, if contributors are to reviewed based upon 'efficiency' metrics?

Tom Morrison

http://www.liant.com

wahlers · Jun 10, 2004

To: 3gm

Thanks for your time and effort!

Your comments are NOT nitpicking but all valid!
Here is my response:

1. 88-levels codeValid and codeInvalid must be alphanumeric values, not numeric.
Answer: Oops! In the shop where I used to work we used 0 and 1 for boolean field values. But the field themselves were always alphanumeric! I thought this could be confusing, so I did a last minute change. Stupid, stupid, stupid!!! (only excuse: They were in a copybook, I had to copy the code anyway...)

2. LENGTH OF is, I believe for COBOL 85, a vendor extension although a widely implemented and very useful one.
answer: Really! I did not know that! I Thought this was included as of COBOL-85 standards.

3. PROCEDURE DIVISION must begin with a paragraph name before the first sentence.
answer: Last time I used ENTRY's a lot (which is definitely NOT a standard!). The compiler never complaint when starting sentences immediately after ENTRY. This code was extracted from a program using entries. In the original program every ENTRY implements a method (by the way...contained programs were used also...but that is another issue).

4. The body of each PERFORM must contain at least one imperative statement (e.g. CONTINUE).
answer: OK! valid criticism. Was inserted by compiler...
It is more correct to insert CONTINUE statement in the code as used to determine the true beginning and end of the product string.

5. IS SPACE, IS ZERO, and IS '-' should be EQUAL SPACE, EQUAL ZERO, and EQUAL '-'.
answer: Could be true. Damn...it tells what a creature of habits you (in the meaning of I) become after a while...!

6. END PROGRAM must be followed directly by the program name, not a period.
answer: OK! Again...last minute name change as with the 88 fields...looked innocent at the time...sorry...

Thanks for pointing out the non-standards. Regards, Wim.

Dimandja · Jun 10, 2004

wahlers said:
However, the inspect statements as used are resource intensive.

This quote is resource intensive [bigsmile]

. COBOL is distributed by many vendors and runs on many platforms. I doubt there is any such thing as a resource intensive COBOL statement, without taking all those factors into consideration.

wahlers · Jun 10, 2004

To: 5ktm

No the INSPECT statement is not more or less efficient then a loop (at least not that I know of). But this was not the issue!

The loop only goes once(!) through the whole string were in the code mentioned the inspect was used multiple(!) times for the same string.
It is this what I called less efficient!

Regards, Wim Ahlers.

k5tm · Jun 10, 2004

wahlers said:
It is this what I called less efficient!

Wim,

I understand the instinct, but you might be surprised if you actually were to measure the various solutions. The initial solution you proposed has five PERFORM loop setups, all with complex termination conditions. INSPECT, however, is a specialized loop and as such might allow the compiler to generate better code.

This is just something to consider, not an absolute. That is why I asked for you to be more specific about the platform.

Tom Morrison

http://www.liant.com

wahlers · Jun 10, 2004

To: k5tm (concerning efficiency)

I see what you mean and you might be right, but...
...yes, there is always an irritating but

I am not interested in the amount of assembler instruction it produces but only interested in the number of effective instruction executed.

I haven't code assembler very much (thank got!) and what I coded was very primitive but I know that an inspect with tallying will process all the 50 bytes (see example programs discussed before). And I know that the individual loops stop at the moment a condition is met (notice the reverse logic reflected in the word NOT in the original program).

Now it might be the case that the compile optimizer does a much better job for the inspect statement in (for instance) using full-word boundary, and some efficient form of indexing, and some other tricky stuff. But I doubt the compiler can 'guess' that it should stop after the first hit count of 2 consecutive hyphens (to name one condition).

In both cases (loop or inspect) there will be a conditional branch (loop at first hit and for the inspect at the end of the string).
All in all, regardless which logic you prefer (loop versus inspect) they both need to process the same conditions.
In both cases (as far as I can judge) the conditions are tested once (and that is optimal!) (*).
But, to my knowledge, the inspect statement will go through the whole string. And the inspect is used more then once.
In contrast each perform picks up were the other left passing each byte only once.

(*) were I say conditions are only tested once is true but in the source the same conditions are coded more then once. however, in such case, it would be more precise to say that each condition is only tested once for each individual byte it has to process (which is necessary condition!).

Conclusion:
Just a long excuse for saying I don't know...or, actually, I don't know for sure!

The whole issue about performance of inspect versus loop in respect to the assembler instructions processed(*) is for the assembler technical knowledgable person who actually likes to verify the bold statement I just made!

(*) note: instructions processed is something else then assembler instruction generated!
More instructions may be generated but depending on the condition logic actually fewer statements may be processed.

Regards, Wim Ahlers.

k5tm · Jun 10, 2004

wahlers said:
But I doubt the compiler can 'guess' that it should stop after the first hit count of 2 consecutive hyphens (to name one condition).

Wim,

If running efficiency is to be considered, then you should not care that this happens, because efficiency when extracting "millions of product codes" must be viewed through the lens of the successful path, not the error path. So, in the case of INSPECT to tally instances of "--", the "millions of product codes" approach is to optimize the case where the result of the inspect = ZERO.

Tom Morrison

http://www.liant.com

Dimandja · Jun 10, 2004

Wim, that's a heck of a lot of buts and ifs...

I don't want to get into compiler design issues, but what you described sounds more like a guess.

We have two COBOL compilers here. An ANSI COBOL85 and a Native Mode COBOL. The same instructions produce completetly different results in the hands of each compiler. You see, the ANSI compiler follows ANSI rules, while the Native compiler follows the NonStop architecture rules very intimately.

Once again, this discussion will make sense only when taking the vendor and the platform into consideration.

Dimandja

wahlers · Jun 10, 2004

To: 3gm

I noticed I forgot to reply on your following remark...

Your remark:
I did find that it doesn't seem to produce the correct result for product numbers that begin with leading zeros and a hyphen (e.g. 0-0100). Perhaps you've simply not given us the full definition of a valid product number

Answer:
I fail to see the problem!
Using your example ---> ' 0-0100 '

The first perform loop positions on the first NON-space character which is the first zero character
(normal direction, this is the most left zero).

The second perform loop positions on the first NON-zero
character which is the hyphen character.

The third perform loop positions (another pointer of course!) on the first NON-space character counting backwards from the end.

Result the 50 byte string is now effectively reduced (guarded by the first- and lastPointer) to:
'-0100'

After the above 3 loops it is tested if both(!) first and last pointer positions have numeric values.
In this specific case that is not true because the first position contains a hyphen character which is not a numeric.
Result: program ends with codeInvalid condition.

I might be blind...but where did I go wrong?

I extracted the damn code from within another construction kind of simulated object-based like code)...and yes I did do some editing (mainly to remove the included copybook and some name changing).

Regards, Wim.

wahlers · Jun 10, 2004

To: k5tm and Dimandja

your Remark (Dimandja):
I don't want to get into compiler design issues, but what you described sounds more like a guess.

To start: I agree with Dimandja, that is:
I don't want to get into compiler design is...
I suggest to stop this line of discussion here and now.

But still some final reaction:

Answer:
Yes! It is guessing...as I described in my conclusion.
And yes I am aware that some code perform a zillion percent better after changing a single compiler option.
I am also aware that different platforms (software+hardware) produce different results.

I am also aware that tweaking source line instructions (being COBOL or any other language) usually is a fruitless job and a waste of time
(...unless...pareto principle...80/20 rule...or whatever you call it...).

I consider the logical algorithm far more important then any tricky triviality. Maintenance, portibility, usability, stability...all characteristics that are far more important.
In general I could not care less which vendor product is used, with which set of options used, on which operating system and on which hardware (unless...pareto principle).

Standards and portability (and the other characteristics I mentioned before) are important.
And performance?
Well, not at all important!
Unless...the users are complaining. Then and only then I will see what I can do.

I once solved a performance question (note: question, NOT problem) asked by a manager in 2 seconds!
It went something like this:

Manager: How about performance?
Me: The system works and the user can use the system in a normal way even during top peak hours!
Manager: OK...but can't we make it faster...I mean even more responsive?
Me: Sure no problem!
Manager: When do you think you will work on this issue?
Me: I just did...and I am already finished!
Manager: What do you mean?
Me: Well...in the past 5 years there have been numerous hardware and software upgrades at fixed date intervals.
Each time the performance team concluded that the system performance increased by about 12% in that year.
I can pretty much guarantee that my (software)system will be about 12% faster by next year!!!!!
(and this ended the discussion with the manager...by the way this is not a made up story. This incident truly happened!)

Your remark (k5tm):
If running efficiency is to be considered, then you should not care that this happens, because efficiency when extracting "millions of product codes" must be viewed through the lens of the successful path, not the error path. So, in the case of INSPECT to tally instances of "--", the "millions of product codes" approach is to optimize the case where the result of the inspect = ZERO.

answer:
I already answered that in a similar fashion. But I formulated it as follows (see some previous reply):
"I am not interested in the amount of assembler instruction it produces but only interested in the number of effective instruction executed."

I may not understand you correctly but I meant the following:
(For the complete source code involved see:

http://mvshelp.com/cgi-bin/ubb6/ultimatebb.cgi?ubb=get_topic;f=1;t=003239)

Given the instruction:
inspect product-string tallying counter for all '--'.

Is it not true that:
The whole product-string is ALWAYS tested
(per 2 bytes, this is 25 compares per passage)
placing the total of the '--' strings found into the variable counter (which may turn out to be zero).

Is it not true that:
The above statement happens regardless the platform or COBOL vendor involved.

Is it not true that:
The original code does NOT test for the '--' condition. That is, it assumes it does not occur (optimistic approach) but only when it encounters a '-' character, which is a valid character, then and only then it will check if the left and right character are numeric (as they should be!).
Is this not exactly what you suggested, namely (and I quote):
...must be viewed through the lens of the successful path, not the error path...

Anyway, as I said before, from my side, and also suggested by Dimandja, I will end this part of the discussion (by the way...by now it is midnight in the Netherlands!).

Finally, thanks for the discussion, I enjoyed it!

Regards, Wim.

k5tm · Jun 10, 2004

wahlers said:
"I am not interested in the amount of assembler instruction it produces but only interested in the number of effective instruction executed."

I may not understand you correctly but ...

You did not understand me correctly, so I will attempt again, just to clarify...

By "running efficiency" I mean exactly "number of effective instructions executed." My specific point is this: It may seem "efficient" that a loop stops when it finds "--", but not if this is the unusual case.

If "millions of product codes" are to be processed, I think one must assume that the number of correctly formatted product codes, plus the number of product codes that violate another rule but not this one, exceeds 98 percent (to pick a number) just because if there are more failures than that your problem of loop efficiency is quite small compared to the problem you will have correcting your input data -- or deciding that a different algorithm is appropriate (2% failure means 20000 failures/million).

Therefore, I must code my algorithm to be efficient for the correct case. Now, stopping a loop when it discovers the first error may be delightful, but perhaps not if it causes the normal (correct) case to be less efficient (by only a small amount if you accept my conjecture). It may be better for the overall performance to treat errors very inefficiently.

Return again to the INSPECT which tallies all the times it finds two adjacent hyphens, rather than stopping when it finds the first. Given that this INSPECT will run to completion without tallying any adjacent hyphens on the vast majority of the input, I don't care if it runs a few extra instructions on the error case. All I care about is this: will the INSPECT be more efficient for the normal, nonerror case. My (very well educated by years of experience) guess is that, for the normal, non-error case, the INSPECT is more efficient than the PERFORM loop because it communicates more information to the compiler about the desired operation, thereby permitting the compiler to generate much more efficient/specialized code.

QED

Okay, as midnight appraoches in Texas, I will join the truce and abandon this particular line.

Tom Morrison

http://www.liant.com

wahlers · Jun 11, 2004

To: k5tm
Sorry I kept you awake...good night...Wim.

3gm · Jun 11, 2004

Wim -

Re: 0-0100 product code

The subroutine as coded DOES flag the code as invalid, however my point was that it doesn't seem to be invalid per your (brief) description. It doesn't begin or end with a hyphen, contains no embedded spaces, is otherwise numeric, no hyphens appear together, and it's not zero.

As I alluded to in my original response, I'm not sure your definition of a valid product code is sufficiently rigorous given this is supposed to be an invalid code. Perhaps you might have said "no hyphen may appear before the first significant digit"?

Glenn

k5tm · Jun 11, 2004

I decided to provide a 'non-looping' version. Have at it, Wim. [bigsmile]

Code:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. getProductCodeNew.
      
       DATA DIVISION.
       WORKING-STORAGE SECTION.

       01  working-string      pic X(50).
       01  working-num         redefines working-string
                               pic z(18)b(32).
       01  c-string            pic X(50).

       01  usage binary.
           02  i                   pic s9(3).
           02  string-length       pic s9(3).
           02  initial-nonspace    pic s9(3).
           02  r                   pic s9(3).
    
       
       LINKAGE SECTION.
      
      *// IN: User supplied product string
       01  passProductString.
         05  productString     PIC  X(50).
      
      *// OUT: product code number
       01  returnProductCode.
         05  productCode       PIC  9(18).
      
      *// OUT: validityIndicator
       01  validityIndicator   PIC  9(01).
           88  codeValid       VALUE    0.
           88  codeInvalid     VALUE    1.
      
       PROCEDURE DIVISION USING  passProductString
                                 returnProductCode
                                 validityIndicator
                                 .
       the-main section.
       a.
           PERFORM  getProductCode
           GOBACK
           .
      
       getProductCode section.
       a.
           set codeInvalid to true.

           move productString to working-string, c-string.
           move 0 to i, string-length, r.
           move 1 to initial-nonspace.
           inspect working-string tallying i for all "--".
           if i > 0 go exit-section.
      *    the string contains no contiguous hyphens
           inspect working-string tallying initial-nonspace 
                                       for leading spaces.
           if initial-nonspace > 50 
           or working-string (initial-nonspace: 1) not numeric
               go exit-section
           end-if.
      *    the string is not all spaces
      *    the leftmost nonspace character is numeric
           inspect c-string (initial-nonspace:)
                            converting "0123456789-" to all "-"
                            before initial space.
           inspect c-string (initial-nonspace:)
                            tallying string-length for leading "-".
           add string-length to initial-nonspace giving r.
           if r < 51
               if c-string (r:) not = spaces
                   go exit-section
               end-if
           end-if.
      *    the string contains no embedded spaces or invalid chars
           subtract 1 from r.
           if working-string (r: 1) not numeric
               go exit-section
           end-if.
      *    the rightmost nonspace character is numeric
           if c-string (initial-nonspace: string-length) not = all "-" 
               go exit-section
           end-if.         
      *    the string contains only hyphens and numeric digits

           inspect working-string converting "-" to space.
           move working-num to productCode.
           set codeValid to true.
           
       exit-section.
           exit.

Results are identical to Wim's for the example strings provided. RM/COBOL, of course!

Tom Morrison

http://www.liant.com

wahlers · Jun 12, 2004

To: 3gm

Sorry, I misunderstood you. You are right!
It is documented in the code (must start numeric > zero) but not at the top of the program.
The documentation grew during development. May be the documentation should have been replaced by something like the following:

#############################################

The getProductCode routine copies the productCode
from the passProductString input field and returns
it into the returnProductCode output field.
In the process it is strips the productCode from
hyphens.

Also, the validityIndicator is set to 'codeValid'
when and only when a valid product code is returned.
(n.b. a valid returned productCode is padded with
leading zeros).

In all other events the validityIndicator is set to
'codeInvalid' and the contents of the returned product
code field is undetermined.

---> Call structure:

CALL getProductCode USING
, passProductString X(50) INPUT (productCode)
, returnProductCode 9(18) OUTPUT
, validityIndicator X(01) OUTPUT (valid = '0')
, (invalid = '1')
END-CALL

The passProductString field contains a productCode.
A productCode is valid only when:

1. It must start with a digit.
2. It must end with a digit.
3. Consecutive digits may be separated
by one hyphen and one hyphen only.
4. A hyphen character is not allowed when
all preceding digits are zeros only.
5. A productCode may have one or more leading
zeros.
6. A productCode only contains hyphens and digits,
space or other characters are not allowed.
7. productCode zero does not exists and is not allowed.

The productCode may start on any position within the productString field.

Some examples of valid product codes are:

in: ' 1-234-56-7 ' -> out: '00000000001234567'
in: ' 00001234-5 ' -> out: '00000000000012345'
in: ' 008 ' -> out: '00000000000000008'
in: ' 800 ' -> out: '00000000000000800'

Some examples of invalid productCodes are:

00000000000 <- productCode zero
-1-234-45-7 <- leading hyphen
1-234-45-7- <- trailing hyphen
1-234--45-7 <- consecutive hyphens
00-00124345 <- hyphen within leading zeros
1-234 45-7 <- space included
1-234@45-7 <- other illegal character

#############################################

...I think the above description is clear (enough) and complete (I hope!).

Documentation

1. What every programmer should do.
2. What most programmers hate to do.
3. What some programmer don't do.
4. Well written (authors opinion).
5. Loosy written (audience opinion).

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Code challenge! :-)

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor