Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SSN SEARCH(RMCOBOL-85) 4

Status
Not open for further replies.

SiouxCityElvis

Programmer
Jun 6, 2003
228
US
Hello. I'm using RMCOBOL-85 on a Linux box.

I need to find a social security number field that would potentially be in a random place in a flat file record.

This data scrubbing is insane to do in COBOL(in my opinion) when Java is touching the file first before sending it to me. Java, from what I understand, has way better features for scrubbing data. But, my reality dictates me to scrub in COBOL.

Example Record.

Code:
Field One name=Bob  address: unknown 111-22-3333 amarillo,tx

Here's what I'm thinking about doing...
Code:
WORKING-STORAGE.

01 CTR-I  PIC 999.

01 SW-END-SSN-LINE PIC X VALUE "N".

01 SW-SSN-FOUND PIC X VALUE "N".

01 WS-SCRUB-SSN.
   05 WS-SCRUB-SSN3                PIC X(3).
   05 WS-SCRUB-SSN-DASH-1          PIC X(1).
   05 WS-SCRUB-SSN2                PIC X(2).
   05 WS-SCRUB-SSN-DASH-2          PIC X(1).
   05 WS-SCRUB-SSN4                PIC X(4).

PROCEDURE DIVISION.
...
...
            
SSN-LINE-PROCESS.
debug      DISPLAY "ON SSN LINE!".
debug      DISPLAY "WS-REC: " WS-SAMPLE-REC.
           MOVE "N" TO SW-SSN-FOUND, SW-END-SSN-LINE.
           MOVE 0 TO CTR-I.
           MOVE SPACES TO WS-SCRUB-SSN.
           PERFORM UNTIL SW-END-SSN-LINE = "Y"
              ADD 1 TO CTR-I
              MOVE WS-SAMPLE-REC(CTR-I:11) TO WS-SCRUB-SSN
              IF WS-SCRUB-SSN-DASH-1 = "-" AND
                 WS-SCRUB-SSN-DASH-2 = "-" 
                 MOVE "Y" TO SW-SSN-FOUND, SW-END-SSN-LINE
           END-PERFORM.
debug **** ACCEPT MENU-PROMPT.
then do rest of program using flag set for ssn-found or not found.

I'm not sure how I'm going to handle the scenario on each iteration to see if I've reached the end of the line(error handling).

Any suggestions?
Thanks.
-David
 
01 SW-SSN-FOUND PIC X VALUE "N".
88 sw-SSN-FOUND-YES VALUE "Y".
....

PERFORM VARYING CRT-I FROM 1 BY 1
UNTIL CRT-I > (LENGH_OF_RECORD - 11)
OR SW-SSN-FOUND-YES
your code here
if valid SSN found
set SW-SSN-FOUND-YES TO TRUE
end-if

END-PERFORM.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
One of the things AmarilloElvis asked for was how to determine the end of the line. Perhaps this code will do:
Code:
Perform Varying LENGTH-OF-REC from Length of WS-REC by -1 Until LENGTH-OF-REC < 11 or WS-REC(LENGTH-OF-REC:1) not = Space
    Continue
End-Perform
 
David,

This is similar to thread209-856792. Deep in that thread I provide a "non-looping" variation of the original program. The following uses the same technique:[ul][li]copy the string;[/li][li]normalize the characters contributing to the pattern; and [/li][li]search for the normalized pattern as a constant.[/li][/ul]The benefit of all this is that the case wherein no SSN is discovered is much more easily discerned in the structure of the code, unobscured by a bunch of PERFORM stuff.


In this case, the pattern sought is ddd-dd-dddd, where d = a numeric digit. What the following code segment illustrates is the process of normalizing the pattern to 000-00-0000 so it may be more easily discovered.
Code:
* data definitions left to the imagination.
       MOVE WS-SAMPLE-REC  TO  WS-SAMPLE-REC-COPY
* normalize all digits to 0
       INSPECT WS-SAMPLE-REC-COPY 
           CONVERTING "123456789" TO ALL "0"
* find the normalized pattern 000-00-0000
       MOVE 1 TO I.
       INSPECT WS-SAMPLE-REC-COPY TALLYING I
           FOR CHARACTERS BEFORE INITIAL "000-00-0000".
       IF I > length of WS-SAMPLE-REC
           DISPLAY "No SSN in: " WS-SAMPLE-REC
       ELSE
* the SSN is in the original where the normalized
* pattern exists in the modified copy
           MOVE WS-SAMPLE-REC (I:11) TO WS-SCRUB-SSN
       END-IF

Tom Morrison
 
Tom Morrison's code is cleaner, simpler, and removes the one error that I had seen but not wanted to comment on. It eliminates those condition when there are two "-" characters two bytes apart but the other characters are not numeric. The earlier code would have picked up "-----------" as an SSN! There might be other code to determine that there are non-numeric characters (or the end of the record) on either side of the SSN.
 
Tom,

I understand the basic concept of your solution.
I don't understand why we move 1 to I before our inspect...is it because we want our substring to be correct after our tally of all characters before initial 000-00-0000?
In other words, if the SSN is at the 20th position, we have to have 1 in I before we tally the 19 characters so that we correctly substring with I=20 later, correct?

Second question..
IF I > length of WS-SAMPLE-REC
DISPLAY "No SSN in: " WS-SAMPLE-REC

is this code for the vent that no SSN is found on that line? Thanks.

Code:
* find the normalized pattern 000-00-0000
       MOVE 1 TO I.
       INSPECT WS-SAMPLE-REC-COPY TALLYING I
           FOR CHARACTERS BEFORE INITIAL "000-00-0000".
       IF I > length of WS-SAMPLE-REC
           DISPLAY "No SSN in: " WS-SAMPLE-REC
       ELSE

-David
 
AmarilloElvis said:
is it because we want our substring to be correct after our tally of all characters before initial 000-00-0000?

Correct. INSPECT ... TALLYING does not initialize the tallying variable; it merely increments the value of the tallying variable.

AmarilloElvis said:
Is this code for the event that no SSN is found on that line?

Correct again. If there is no match, the tallying variable will be incremented once for each character in WS-SAMPLE-REC-COPY, thereby making its value greater than the length of WS-SAMPLE-REC-COPY.

Tom Morrison
 
Excellent! That's a great solution. I just tested it and it works just as you explained.
Thanks a lot.
-David
 
David,

Another way to find the SSN in the record would be to use the LIKE condition. This is a RM/COBOL extension that was included in version 7.50. Below is sample with some test data, give it a try.

Code:
       IDENTIFICATION DIVISION.
       PROGRAM-ID.     SSN.
       AUTHOR.         RAH.
       REMARKS.        Find SSN Number in string.
       ENVIRONMENT DIVISION.
       CONFIGURATION SECTION.
       SOURCE-COMPUTER.  RMCOBOL.
       OBJECT-COMPUTER.  RMCOBOL.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       77 WS-YN                    PIC X(01)  VALUE SPACES.
       77 WS-SUB                   PIC 9(02)  VALUE 0.
       77 X                        PIC 9(02)  VALUE 0.
      * Pattern is 3 digits, a -, two digits, a -, and 4 digits.
       78 SSN-PATTERN              VALUE "[0-9]{3}\-[0-9]{2}\-[0-9]{4}".

       01 WS-SSN-FLAG              PIC X(01)  VALUE "Y".
          88 SSN-FOUND             VALUE "Y" WHEN FALSE "N".

       01 WS-TEST-DATA.
          03                       PIC X(50)  VALUE
             "3Fred C. Dobbs, 123 Sierra Madre Blvd5123-45-78903".
          03                       PIC X(50)  VALUE
             "456-12-12345 Richard Blaine 1234 CasaBlanca Street".
          03                       PIC X(50)  VALUE
             "a    ABSamuel Spade , 1234 Maltese St.,123-45-7890".
          03                       PIC X(50)  VALUE
             "No SSN Is in this record 213-213-123abc 39-45-7890".
       01 WS-TEST-DATA-RED REDEFINES WS-TEST-DATA.
          03 WS-RECORD OCCURS 4 TIMES PIC X(50).

       01 WS-SSN-FIELD              PIC X(11)  VALUE SPACES.
           
       PROCEDURE DIVISION.
       MAIN-START.
           DISPLAY SPACE ERASE.
      
      * Attempt to get SSN using the LIKE condition
          
           PERFORM VARYING WS-SUB FROM 1 BY 1 UNTIL WS-SUB > 4
               SET SSN-FOUND TO FALSE
               MOVE SPACES TO WS-SSN-FIELD
               PERFORM VARYING X FROM 1 BY 1 UNTIL X > 50 OR SSN-FOUND
                  IF WS-RECORD(WS-SUB)(X:11) IS LIKE SSN-PATTERN
                       SET SSN-FOUND TO TRUE
                       MOVE WS-RECORD(WS-SUB)(X:11) TO WS-SSN-FIELD
                  END-IF
               END-PERFORM
               IF SSN-FOUND
                   DISPLAY "SSN Found = " Line 10 Col 1 WS-SSN-FIELD
               ELSE
                   DISPLAY "No SSN Found  " Line 10 Col 1 ERASE EOL
               END-IF
               ACCEPT WS-YN
           END-PERFORM.
           STOP RUN.
       END PROGRAM SSN.

[\code]

-Robert Heady
Liant Software Corp.
 
BTW, Robert, you got the TGML terminater flag wrong. It is "/", not "\". You should Preview your post before Subitting it.
 
Webrabbit you are right!

"You should Preview your post before Subitting [sic] it." [smile]

Yeah, I noticed I had the slash wrong. I was previewing the post when I pushed the wrong button and submited the post instead of editing it. I hate it when that happens.

Tom's code is likely faster, my intent was to demonstrate the LIKE condition as an option for pattern matching.

Robert Heady
Liant Software Corp.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top