RICHINMINN
Programmer
- Dec 31, 2001
- 138
Does anyone have a recommendation on how to check an input file for duplicate data? I've got a file that caused a ton of problems this past week by duplicating the correct set of records six times. (The company sending the file was having FTP problems, and so ended up sending the file six times, the first five files were 99.6 complete - 281,010 records out of 282,186), followed by one complete file. The resulting file contained 1,687,236 records. The file is fixed length, with a length of 320 bytes.
(This is on a large, corporate Amdahl system, with tons of storage, running IBM COBOL II on OS/390.)
Question:
How can I pre-process this file to ensure that if any duplicate data is sent, I can bypass the duplicate records? There are no handy fields that are unique to each record. I was wondering about generating a checksum for each record, then writing that checksum value out to a VSAM file. If I would encounter a duplicate record, it should generate an identical checksum value, and which would show up as already having been written out to the VSAM file.
Does anyone have a sample of such a checksum algorithm? Or any other ideas?
Rich (in Minn.)
(This is on a large, corporate Amdahl system, with tons of storage, running IBM COBOL II on OS/390.)
Question:
How can I pre-process this file to ensure that if any duplicate data is sent, I can bypass the duplicate records? There are no handy fields that are unique to each record. I was wondering about generating a checksum for each record, then writing that checksum value out to a VSAM file. If I would encounter a duplicate record, it should generate an identical checksum value, and which would show up as already having been written out to the VSAM file.
Does anyone have a sample of such a checksum algorithm? Or any other ideas?
Rich (in Minn.)