Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sorting algorithms 1

Status
Not open for further replies.

RonaldB

Programmer
Nov 28, 2000
196
0
0
US
Greetings, Earthlings,

Does anyone have ideas on good sorting algorithms, like sorting an internal table (in Working-Storage) ? My knowledge basically ends with bubble sort; i forgot about brick sort, i just recently learned about bi-directional bubble sort (interesting too, by the way), and am curious about an implementation of Quicksort in COBOL. I found a site adressing some algorithms, but with rather general descriptions.

I don't have a specific problem; just being curious here.

Thanks,
Ronald.
 
Ronald, I'm sure that you are familiar with programs such as the IBM utility ICEMAN which do external sorts. And I assume that you are familiar with COBOL internal sorts where you define a "SD", a sort file, release the records for sorting, then have them returned.

I had to do a sort routine for an online CICS program for a table consisting only of 1 to 8 records. So my project leader and I decided that using external sort programs would use up too many resources (to call or transfer control to another program). Each time the table of 1 to 8 records was fed to my update/modify program, the records had to be sorted according to Initial (primary key) and Reconsideration (secondary key) dates before the final write/rewrite of the VSAM file involved.

So I wrote a sort of modified bubble sort. It involves putting the primary and secondary keys into a holding field, then comparing each incoming record with what was in the holding field.

Let me do some abbreviated pseudo-code to show how it worked:

The first paragraph is a sort of "mainline" or "driver" for the sort routines; "ASANCREC" was the name of the file which was to be eventually written or updated:

7000-SORT-ROUTINE.

PERFORM 7010-MOVE-INTO-WS-TABLE THRU 7010-EXIT.

PERFORM 7020-COMPARE-AND-SORT THRU 7020-EXIT.

PERFORM 7040-MOVE-TO-ASANCREC THRU 7040-EXIT.

Each record is moved from an incoming indexed table into a subscripted working storage table. I specifically used subscripts rather than an index because I wind up switching subscripts in the middle of the sort routine.

In the 7010 paragraph, I do the following inline PERFORM:

[initialize all indexes, subcripts and the counter]
PERFORM UNTIL [incoming table reaches the maximum occurences]

MOVE INIT-DATE (IDX) TO WS-INIT-DATE (SUB1)
MOVE RECON-DATE (IDX) TO WS-RECON-DATE (SUB1)
[Add +1 to subscript, set index up by 1, add +1 to CTR]

Then back to the 7000 "driver" paragraph, then go to the 7020 paragraph. The initial date is the primary key and the reconsideration date is the secondary key. And the first time a record comes down the pike, the holding areas will contain spaces. So each first record goes through this process, it will do the first "when" commands. The subsequent times that a record comes down the pike, the holding areas will contain values:


MOVE +1 TO SUB1
PERFORM WITH TEST AFTER
UNTIL SUB1 > WS-OCCUR-CTR
EVALUATE TRUE
WHEN WS-INIT-DATE (SUB1) > WS-HOLD-INIT-DATE
MOVE WS-INIT-DATE (SUB1) TO WS-HOLD-INIT-DATE
MOVE WS-RECON-DATE (SUB1) TO WS-HOLD-RECON-DATE
WHEN WS-INIT-DATE (SUB1) < WS-HOLD-INIT-DATE
[we're now going to have to do some sorting]
MOVE WS-INIT-DATE (SUB1)
TO WS-COMPARE-INIT-DATE
[move to another &quot;holding&quot; section&quot;]
MOVE WS-RECON-DATE (SUB1)
TO WS-COMPARE-RECON-DATE
MOVE SUB1 TO INSERT-SUB
[we want the SUB1 subcript to hold our place, so we need to move its value to a different subscript for the insertion to follow. We're going to go to the place of insertion and then keep subtracting until we find a record (an INSERT-SUB - +1) whose key is less than the one being inserted. This is how we know when to insert.]
PERFORM 7030-INSERT-RTN THRU 7030-EXIT
UNTIL WS-COMPARE-INIT-DATE >
WS-INIT-DATE (INSERT-SUB - +1)
[using primary key]
OR
(WS-COMPARE-INIT-DT =
WS-INIT-DATE (INSERT-SUB - +1)
AND
WS-COMPARE-RECON-DT >
WS-RECON-DATE (INSERT-SUB - +1))
[using secondary key]
OR
(INSERT-SUB - +1 < +1)
[don't want to exceed the bounds of our table]

WHEN WS-INIT-DATE (SUB1) = WS-HOLD-INIT-DATE
IF WS-RECON-DATE (SUB1) < WS-HOLD-RECON-DATE
[using secondary key to sort. Perform insert routines as described above.]
ELSE
[move fields to holding areas]
END-IF
END-EVALUATE
ADD +1 TO SUB1
END-PERFORM.

Here is the actual inserting paragraph. We're going to kick everyone up a notch:

7030-INSERT-RTN.
MOVE WS-INIT-DATE (INSERT-SUB - +1)
TO WS-INIT-DATE (INSERT-SUB).
MOVE WS-AS-RECON-DATE (INSERT-SUB - +1)
TO WS-AS-RECON-DATE (INSERT-SUB).
[Now we're going to insert the fields which are currently sitting in the &quot;COMPARE&quot; fields.]
MOVE WS-COMPARE-INIT-DATE
TO WS-INIT-DATE (INSERT-SUB - +1)
MOVE WS-COMPARE-RECON-DATE
TO WS-RECON-DATE (INSERT-SUB - +1)
[Now we subtract +1 to go down to compare with the next record down. We keep doing this iteration. As soon as the record before this one is smaller than this one, or we reach the number 1 occurrence, the processing stops.]
SUBTRACT +1 FROM INSERT-SUB.

Now back to our &quot;driver&quot; where we will move the now-sorted working storage table into our record description so we can do the write or re-write.

I have tested this algorithm a zillion times, with all sorts of combinations. It has passed through our validation quality control tests, and is now in final testing before it gets moved to production. And this sort routine works like a champ every time. :)

Hope this was the sort of thing you were looking for (and please excuse the bad pun.. :-D)
Nina Too




 
Don't sweat it if you do not have a real problem. The next COBOL Standard will have Table Sorts as a regular feature.

Stephen J Spiro
Member, J4 COBOL Standards Committee
 
Ronald, by the way, what is a &quot;bi-directional bubble sort?&quot; Just curious.

Nina Too
 
It is easier to sort a file before you make a table outside of the program.
 
Hi,

I clipped this out of a program that contains an internal table which is sorted by a bubble sort with not so many statements.

I hope this is helpful.

Greetings,

Crox


Code:
001900 01  HULPVELDEN.
002000     03  SUB-FIRST                   PIC S9(4) COMP-5.
002100     03  SUB-LAST                    PIC S9(4) COMP-5.
002200     03  H-B-PARM-ELEMENT.
002300         05  H-B-PARM                PIC 9(09).
002400         05  H-PERCENT-VULLING       PIC 9(09).


023300     03  TAB-B-PARM-ELEMENT OCCURS 63.
023400         05  TAB-B-PARM              PIC 9(02).
023500         05  TAB-PERCENT-VULLING     PIC 9(02)9(03).


129800 PERCENT-VOLGORDE SECTION.
129900     PERFORM BEREKEN-B-PARM.
130000     PERFORM VARYING SUB-FIRST FROM +1 BY +1 UNTIL SUB-FIRST > 63
130100               AFTER SUB-LAST  FROM 63 BY -1 UNTIL SUB-LAST <
130200                     SUB-FIRST + 1
130300         IF TAB-PERCENT-VULLING (SUB-FIRST) <
130400            TAB-PERCENT-VULLING (SUB-LAST)
130500
130700              MOVE TAB-PERCENT-VULLING (SUB-FIRST) TO
130800                     H-PERCENT-VULLING
130900              MOVE TAB-PERCENT-VULLING (SUB-LAST)  TO
131000                   TAB-PERCENT-VULLING (SUB-FIRST)
131100              MOVE   H-PERCENT-VULLING             TO
131200                   TAB-PERCENT-VULLING (SUB-LAST)
131300
131500              MOVE TAB-B-PARM (SUB-FIRST) TO
131600                     H-B-PARM
131700              MOVE TAB-B-PARM (SUB-LAST)           TO
131800                   TAB-B-PARM (SUB-FIRST)
131900              MOVE   H-B-PARM                      TO
132000                   TAB-B-PARM (SUB-LAST)
132100     END-PERFORM.
 
All,

thanks for all the help. I could have known sorting was one of those subjects everybody encountered one time or another.
I realize my question was a bit vague; basically, a collegue of mine is working on a called module that has to maintain an internal (WS) table with some values. On the initial call, this table can be filled in sequence, but during subsequent calls the table can be extended with new values, to be added at the rear.
Values are searched in this table, and since search works much more efficient on a sorted table and the table can contain up to 9999 elements, sorting it sounded like a plan. Of course, that shouldn't take up more time then searching an unsorted table, hence my interest in a fast sorting algorithm.
Sorting files is not an issue, since we use Syncsort for external sorts, and no COBOL sort can beat this sucker at speed.

Nina,

a bidirectional bubble sort (so i recently discovered) works basically like a normal bubble sort, but in both directions (hence the name):

First, you work through your set of values from beginning to end, swapping elements that are in the wrong order. This will take larger values back towards the end, but leaves smaller elements - practically - in place. Next, you perform the same action, but now from end to beginning, leaving larger values practically in place, but carrying smaller values straight to the right place.
Since you go back and forth, this way of sorting is much more efficient and thus faster then single direction bubble sort, especially when the set of values is mainly in right sorting order, except for a couple of values at the end. Exactly the situation where an already sorted table is extended with new values !!! Hmmm, interesting ...

Regards,
Ronald.
 
The reason I did my sort the way I did, even though it has more paragraphs and more statements, is that, in the end, I found it to actually be more efficient because less actual moves get done. The less actual moves that have to be done, the less processing has to be done, and the more resources you save.

You only have to go in one direction (downward) with my algorithm, and only if the incoming record is less than the biggest record in the already-sorted portion of the WS table; then you find the place to insert it.

Saving resources might not seem important these days, but actually it is, when a company or agency has to process a huge number of records and files each day.

Nina Too
 
Ronald,

When you want to maintain a sorted table in a called module, it can be smart to use a binairy search to find the place for the new element. After that you use reference modification to break up the old table and insert the new element.

The best thing is to organize the table from the backsite to the front so that you can do overlapping moves from the back to the front when you want to make a place for the new element.

I use the same technique.

I hope this is helpful :)

Regards,

Crox
 
Crox,

i did wonder about a technique to insert elements in an already sorted table while maintaining sort order; i had moving all elements after the desired position back to the end one by one in mind, which would likely mean a lot of moves, but the ref mod idea is a brilliant one !
Once you figured out where to start and how long it should be, it's just one move and your done !

Thanks,
Ronald.
 
Crox,

Can you supply just a bit of sample pseudo-code to show us how the binary search and reference-modification method work?

Just curious.

Thanks, Nina Too
 
Hi,

to explain how a binairy search works, I give you my source of a program that converts hexadecimal text into the binairy value, using a coded binairy search. You can also use the search all statement in COBOL that should do the same, but sometimes it is nice to have your own code. The program is developed to use with CA-REALIA COBOL 4.2.

Regards,

Crox

Code:
000100*$CALL
000200 IDENTIFICATION DIVISION.
000300 PROGRAM-ID. HEX2BIN. 
000301*copyright(c) Ronald Wouterson alias Crox :-)
000400 ENVIRONMENT DIVISION.
000500 CONFIGURATION SECTION.
000600 SOURCE-COMPUTER. IBM-PC.
000700 OBJECT-COMPUTER. IBM-PC.
000800 SPECIAL-NAMES.
000900     DECIMAL-POINT IS COMMA.
001000 INPUT-OUTPUT SECTION.
001100 FILE-CONTROL.
001200 DATA DIVISION.
001300 FILE SECTION.
001400 WORKING-STORAGE SECTION.
001500 01  HULPVELDEN.
001600     03  SUB-STRING                  PIC S9(4) COMP-5.
001700     03  SUB-FIRST                   PIC S9(4) COMP-5.
001800     03  SUB-LAST                    PIC S9(4) COMP-5.
001900     03  H-HEX-TEKEN                 PIC XX.
002000     03  TABEL-HEX.
002100         05  FILLER                  PIC X(32) VALUE
002200     '000102030405060708090A0B0C0D0E0F'.
002300         05  FILLER                  PIC X(32) VALUE
002400     '101112131415161718191A1B1C1D1E1F'.
002500         05  FILLER                  PIC X(32) VALUE
002600     '202122232425262728292A2B2C2D2E2F'.
002700         05  FILLER                  PIC X(32) VALUE
002800     '303132333435363738393A3B3C3D3E3F'.
002900         05  FILLER                  PIC X(32) VALUE
003000     '404142434445464748494A4B4C4D4E4F'.
003100         05  FILLER                  PIC X(32) VALUE
003200     '505152535455565758595A5B5C5D5E5F'.
003300         05  FILLER                  PIC X(32) VALUE
003400     '606162636465666768696A6B6C6D6E6F'.
003500         05  FILLER                  PIC X(32) VALUE
003600     '707172737475767778797A7B7C7D7E7F'.
003700         05  FILLER                  PIC X(32) VALUE
003800     '808182838485868788898A8B8C8D8E8F'.
003900         05  FILLER                  PIC X(32) VALUE
004000     '909192939495969798999A9B9C9D9E9F'.
004100         05  FILLER                  PIC X(32) VALUE
004200     'A0A1A2A3A4A5A6A7A8A9AAABACADAEAF'.
004300         05  FILLER                  PIC X(32) VALUE
004400     'B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF'.
004500         05  FILLER                  PIC X(32) VALUE
004600     'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF'.
004700         05  FILLER                  PIC X(32) VALUE
004800     'D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF'.
004900         05  FILLER                  PIC X(32) VALUE
005000     'E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF'.
005100         05  FILLER                  PIC X(32) VALUE
005200     'F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'.
005300     03  FILLER REDEFINES TABEL-HEX.
005400         05  FILLER                  PIC XX.
005500         05  HEX-OMSCHRIJVING OCCURS 255
005600                                     PIC XX.
005700
005800     03  TEMP.
005900         05  FILLER                  PIC X VALUE LOW-VALUE.
006000         05  TESTWAARDE              PIC X.
006100     03  SUB REDEFINES TEMP          PIC S9(4) COMP.
006200
006300 LINKAGE SECTION.
006400 01  LNK-LENGTE-STRING               PIC S9(4) COMP-5.
006500
006600 01  LNK-TXT-STRING.
006700     03  LNK-NORMAAL-TEKEN  OCCURS 1 TO 16000
006800         DEPENDING ON LNK-LENGTE-STRING
006900                                     PIC X.
007000
007100 01  LNK-HEX-STRING.
007200     03  LNK-HEX-TEKEN      OCCURS 1 TO 32000
007300         DEPENDING ON LNK-LENGTE-STRING
007400                                     PIC XX.
007500
007600 PROCEDURE DIVISION.
007700 HOOFD SECTION.
007800     ENTRY 'USR_HEX2BIN' USING LNK-LENGTE-STRING
007900                               LNK-TXT-STRING
008000                               LNK-HEX-STRING.
008100 HOO-01.
008200     PERFORM VARYING SUB-STRING FROM +1 BY +1 UNTIL
008300                     SUB-STRING > LNK-LENGTE-STRING
008400          MOVE LNK-HEX-TEKEN     (SUB-STRING)
008500            TO   H-HEX-TEKEN
008600          PERFORM ZOEK-H-HEX-OMSCHRIJVING
008700          MOVE TESTWAARDE TO LNK-NORMAAL-TEKEN (SUB-STRING)
008800     END-PERFORM.
008900 HOO-99.
009000     GOBACK.
009100
009200
009300 ZOEK-H-HEX-OMSCHRIJVING SECTION.
009400 ZHH-01.
009500     MOVE ZERO TO SUB-FIRST.
009600     MOVE +255 TO SUB-LAST.
009700     PERFORM UNTIL SUB-FIRST > SUB-LAST
009800         COMPUTE SUB = (SUB-FIRST + SUB-LAST) / 2
009900         IF  H-HEX-TEKEN >
010000             HEX-OMSCHRIJVING (SUB)
010100              COMPUTE SUB-FIRST = SUB + 1
010200         ELSE
010300         IF  H-HEX-TEKEN <
010400             HEX-OMSCHRIJVING (SUB)
010500              COMPUTE SUB-LAST = SUB - 1
010600         ELSE
010700             GO TO ZHH-99
010800         END-IF
010900     END-PERFORM.
011000 ZHH-03.
011100     MOVE +9999 TO RETURN-CODE.
011200 ZHH-99.
011300     EXIT.

 
Hi Nina,

If you want to do the reference modification trick, you just count out which positions in the table you want to move.

It is the best to organize the table in a way that it is build up from the end to the front.

One example of a table with 5 elements, 3 are filled, one comes:

<start of table>
1) blank
2) blank
3) a
4) b
5) d
<end of table>

The new element comes in, with value 'c'
so you count out that it should be on position 4. So you need to shift elements 3 and 4 to the places 2 and 3. You can do that by a statement like:

MOVE THE-TABLE (3:2) TO THE-TABLE (2:2).
MOVE NEW-ELEMENT TO TABLE-ELEMENT(4).

The table is defined as something like:

01 THE-TABLE.
03 TABLE-ELEMENT OCCURS 5 PIC X.

That is the general idea. This works ok because you move from the end to the front. If you do it the other way around, you will find out that some characters will be copied! Experiment with it, it is funny although some compilers do understand this and warn you for it!

I hope this is helpful and gives some fun! :)
 
Nina,

on the risk of telling something you already know:

a binary search on a sorted table works as follows:
First, you regard the whole table, pick the element in the middle, and check that value against your search value:
- if equal: beginners luck; you're done.
- if greater, next step regard the bottom half of your table;
- if smaller, next step regard the top half of your table.

Now, in the half you picked as described above, pick the middle element, and proceed as described in the first step, until you've found your search value, or your search area is reduced to 1 or 2 elements (a not-found situation).

Regards,
Ronald.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top