Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RM/COBOL data extraction

Status
Not open for further replies.

foxmuldr3

Programmer
Jul 19, 2012
170
US
A while back I posted about a tool I found to extract RM/COBOL data files. It doesn't work in all cases, and I've been working on writing a replacement for it. It also mostly works, but is clunky. :)

Does anybody have experience working with RM/COBOL files and the format of their internal structures? I've identified the header structure from the rm-decoder app above:

Code:
// u8=char (8-bit unsigned), u16=short (16-bit unsigned), the _be indicates big-endian format)
struct SRmFileHeader
{                           // offset,length
    u8   fill0;             //  0,1
    u8   page_id;           //  1,1
    u8   fill1;             //  2,4
    u8   signature[4];      //  6,4
    u8   fill2[6];          // 10,6
    u16  minRecord_be;      // 16,2
    u16  maxRecord_be;      // 18,2
    u8   fill3;             // 20,1
    u8   spaceCode;         // 21,1
    u8   numberCode;        // 22,1
    u8   compression;       // 23,1
    u8   keyNumberCode;     // 24,1
    u8   fill4;             // 25,1
    u16  blockSize;         // 26,2
    u16  blockIncrement_be; // 28,2
    u8   blockContains;     // 30,1
    u8   fill5[13];         // 31,13
    u16  indexBlocks_be;    // 44,2
    u8   fill6[6];          // 46,6
    u16  numRecords_be;     // 52,2
    u8   integrityFlag;     // 54,1
};

Between multiple records on a page, I see content like this:

Code:
0x00FADCD8  .. .. .. .. .. .. .. .. c1 03 35 30 7b c3 01 7b  .[data].Á.50{Ã.{
0x00FADCE8  c3 01 7b c3 01 7b c3 01 7b c3 01 7b c3 01 7b 00  Ã.{Ã.{Ã.{Ã.{Ã.{.
0x00FADCF8  02 00 00 06 35 00 00 06 5f 00 00 06 69 00 00 06  ....5..._...i...
0x00FADD08  55 00 00 06 3c 00 00 06 4b 00 00 06 af 00 00 06  U...<...K...¯...
0x00FADD18  70 00 00 18 fa 00 00 16 79 00 00 0e ca 00 d7 07  p...ú...y...Ê.×.
0x00FADD28  .. .. .. .. .. .. .. 95 11 .. .. .. .. .. .. ..  .[data]...[data]

I've figured out the codes there, like 0xd7,0x07 and 0x95,0x11 ... they indicate a code for how many characters to copy. The logic goes like this (in C):

Code:
// If it's encoded, then process it special
if (dataIn[lnI] > 127)
{
    if (dataIn[lnI] > 231)
    {
        // Fill with the indicated character
        lnFillCount = (u32)dataIn[lnI] - 230;
        lcFillChar  = dataIn[++lnI];

    } else if (dataIn[lnI] > 207) {
        // Fill with NULLs
        lnFillCount = (u32)dataIn[lnI] - 210;
        lcFillChar  = 0;

    } else if (dataIn[lnI] > 191) {
        // Fill with '0' characters
        lnFillCount = (u32)dataIn[lnI] - 190;
        lcFillChar  = '0';

    } else {
        // Fill with spaces
        lnFillCount = (u32)dataIn[lnI] - 126;
        lcFillChar  = ' ';
    }

    // Populate the fill portion
    for (lnCount = 0; lnCount < lnFillCount; ++lnCount)
        dataOut[lnO++] = lcFillChar;

} else {
    // Copy characters
    lnCount = dataIn[lnI];
    for (++lnI; lnCount > 0; --lnCount)
        dataOut[lnO++] = dataIn[lnI++];
}

At the end of a record following the last one on a page, I see content like this:

Code:
0x00FADDD8  .. .. .. .. .. .. .. .. .. .. .. c1 03 35 30 7b  ..[data]...Á.50{
0x00FADDE8  c1 03 35 30 7b c3 01 7b c3 01 7b c3 01 7b c3 01  Á.50{Ã.{Ã.{Ã.{Ã.
0x00FADDF8  7b c1 03 35 30 7b c3 01 7b c3 01 7b c3 01 7b c3  {Á.50{Ã.{Ã.{Ã.{Ã
0x00FADE08  01 7b c3 01 7b c3 01 7b 31 c0 01 7b 01 31 c0 01  .{Ã.{Ã.{1À.{.1À.
0x00FADE18  7b c3 01 7b c3 01 7b c3 01 7b c3 01 7b c0 01 31  {Ã.{Ã.{Ã.{Ã.{À.1
0x00FADE28  c0 01 7b 30 7b 30 7b 01 7b ff ff ff ff ff ff ff  À.{0{0{.{ÿÿÿÿÿÿÿ
0x00FADE38  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x00FADE48  ff ff ff ff ff ff -- -- -- -- -- -- -- -- -- --  ÿÿÿÿÿÿ

I can't figure out how to parse the portions between the encoded data blocks. If anyone has any help I'd be greatly appreciative.

--
Rick C. Hodgin
 
don't reinvent the wheel. have a look at while RM-COBOL is not mentioned as supported the use of Fujitsu Cobol as file type will work for most RM-COBOL file formats (for info Fujitsu bought a license to use RM filesystem format hence the compatibility)


Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Hi Frederico. I tried Record Editor, but I can't get it to recognize the file format. I do see from the Fujitsu file formats some information about how the file is organized internally. It appears similar but different from these files I possess.

--
Rick C. Hodgin
 
what is the exact cobol runtime version used to generate these files?
and can you create a new set of files with dummy data and upload for us to look at?

and do you have a valid runtime installation with all executables? there are utilities within it that may help "dumping" the data into a format which is easier to read/process.

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
I don't know. The company that was servicing these data files slowed and eventually stopped communicating with the people working with the data. This went on for about a year. As such, they have no contacts in the company, and their software won't run because of the license issue.

They have a backup of the data files, which is what we have to work with.

With the help of the rm-decode app I mentioned, I've managed to figure out how field data is stored internally. The issue I'm having now is the header portion of each record on a page, and why page types 6 are valid, 7 and 8 are not though they seem to have valid data. That assessment is based on some printed reports we have to corroborate against.

--
Rick C. Hodgin
 
see if you can find where the application resided (e.g. where the executable was installed) and see if there are further executables and other files there - namely some called recovery* - and if you do find a exe called runcobol try and execute it - should output version.
if it won't work but if you do have a file called run.msg that will also contain the version.

depending on what you have I may be able to help further (for free).

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
First 32 bytes of the file:
Code:
00000:  01 00 00 00 01 52 4d 4b 46 00 02 00 00 00 02 02  --  .....RMKF.......
00016:  11 02 11 02 20 30 02 20 0c 02 4e 02 4e 00 01 83  --  .... 0. ..N.N...

--
Rick C. Hodgin
 
assuming that that is what you have installed on your installation and you have the full install it does help.

if by any change the vendor also left behind the File definitions, either on a file, or on documentation, that would also help as interpreting the files after they are "converted" to a usable format would be easier for you.

reason why I asked for the recover files if you had them is that there are a set of utilities supplied with the runtime (and compiler as well) which can process the files and "extract" them to a sequential file - removing compression and leaving raw data just as it is defined on the program (with both alpha and numeric datatypes as defined, so you will get comp, comp-1, comp-2, comp-5, comp-6 and binary data on them)

so the way we work with these is as follows
first set env variable PRINTER to a local file - this is so output to print from programs go to the file instead of physical printer (or pdf)

2 commands to run for each file
Code:
rmmapinx - this will give important information about the files, mainly number of keys and their size position on the record
    parms are inputfile, detail, printer
    runcobol ..\rmc85\rmmapinx K a="C:\facsys\USR\cli01,detail,printer"
recove2 - this will convert from a indexed file to a sequential file (NOT line sequential)
    parms are inputfile, output file, NOSUB
    runcobol ..\rmc85\recover2 K a="C:\facsys\USR\cli01,C:\facsys\USR\cli01_recover2,NOSUB"

output of the programs above will contain some useful information as I said - one of them is also the minimum and maximum record size written to the output file. and few other bits...

once the recover2 is run the output file defined will normally contain 4 bytes (this is record size, little indian) + record data + 4 bytes (which I don't remember what they are and don't have documentation on it at the moment)

whit the above information its easy to split and parse the file onto its individual records - and then the fun begins with figuring out what the definition of each block of record means - easy if you have the file/record definition from the vender, trial and error otherwise)

sample output of rmmapinx and recover2 over same file
Code:
outputof RMMAPINX
 RM/COBOL Map Key Utility - 6.1    03-04-2024  14:11:10         Page     1

File Information:
  C:\facsys\USR\cli01 is an Indexed File.
  Records are fixed length = 496 Bytes.
  Disk Block Size = 512 Bytes, User Block Size = not specified.
  Data Records are compressed, Keys are compressed.
    Data Block Space Character Value = 32.
    Data Block Zero Character Value = 48.
    Key Block Space Character Value = 32.
  File has 2 Keys and 2 Segments.
  File contains 20 Records and occupies 16 Blocks.
  There are 7 empty Blocks.
  6 empty Blocks may be needed for a write.

Detail Information:
  File version number = 0.
  Minimum read version number = 0.
  Minimum write version number = 0.
  Disk Block Increment Size = 512 Bytes.
  Allocation Increment = 16 Blocks.
  Recoverability/Performance Strategy:
    Data are forced to the system only when necessary.
      Force Write Data Blocks = No.
      Force Write Index Blocks = No.
      Force to Disk = No.
      Force File Closed = No.

Key Information:
     Key   Segment  Starting  Segment    Key    Tree   Duplicates
   Number  Number   Position  Length   Length  Height  Permitted?
     ---     ---      -----     ---      ---     --        ---
    Prime      1         42       4        4      1        No
       1       1          1      45       45      2        No



output of RECOVER2
Index File Recovery Utility
Copy all data records to dropped record file
Index File:                           C:\facsys\USR\cli01
Drop File:                            C:\facsys\USR\cli01_recover2
Option:                               NOSUB
Disk Block Size:                            512
Disk Block Increment:                       512
Maximum Record Length:                      496
Minimum Record Length:                      496
Data Record Compression (y/n)?                Y
SPACE Character Value:                       32
ZERO Character Value:                        48
Number of Keys that allow Duplicates:         0
Block being Processed 17
Records Written to Drop File 20






Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Just wanted to keep you up to date. We've asked the company for information, and they have not responded yet. We were told it had to go through their IT department, as our contacts are just office staff / end-users.

I've been able to download a copy of RM/COBOL-85 5.36.00 for DOS 2.00+ from WinWorld, and it comes with a source file called PACETEST.CBL, which compiles to PACETEST.COB, which can be run with RUNCOBOL PACETEST.COB, and it generates two output files which are in the same RMKF table formats. Examining internally with a hex editor, I can see the 05, 06, and 08 pages. I can also look back at the source code of the PACETEST.CBL to see how the fields are laid out logically, and how they are stored mechanically in the file.

I'm thinking if I can parse out the fields on each record, then I could create a series of "field1", "field2", ... fields and generate an actual COBOL file structure which matches what we actually see in the data. And if so, then we can do the data extract that way, but having a dump.cbl program which is created for each file.

Does that kind of solution seem reasonable?

I find it interesting that the RMKF files do not contain the structure definition for themselves internally.

--
Rick C. Hodgin
 
Hi,
not really feasible except on the most simple cases.

if you have the full install you should have the above mentioned programs, mainly the rmmmapinx.cob - and if you do have a copy of the original client files you can run it over their files and see if they have compression enabled.

If not then it is possible to extract data the way you tried - if it is enabled them its impossible to do it that way.
that is just one of the hurdles.

next is parsing the records themselves - within COBOL it will be extremely common to have multiple record types on the same file, each with its own definition.
it will also be likely that some records, even if of the same logical type (e.g. clients vs vendors on a client file) they may have parts of the record defined differently - which adds to yet another layer of processing the data.
This bit you will need to do regardless of the way you first extract each record - even by using the recovery2.cob method above, which I highly recommend on your case as it removed one layer of complexity.

if the client does end up having the file and records definitions then this work becomes easier as you will then know what each record should look like and can code the numeric conversions accordingly.



Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
If compression is specified by this byte in the page 0 header portion of each file:
Code:
    u8   compression;       // 23,1

Then I can tell you the 97 files we have all have the value 2 in that field. And, when I open them in a text editor, they are straight-forward readable, albeit compressed together as this block of code indicates above:

Code:
// If it's encoded, then process it special
if (dataIn[lnI] > 127)
{
         if (dataIn[lnI] > 231)  { /* count = dataIn[lnI] - 230, fill with the indicated character */ }
    else if (dataIn[lnI] > 207)  { /* count = dataIn[lnI] - 210, fill with NULLs */ }
    else if (dataIn[lnI] > 191)  { /* count = dataIn[lnI] - 190, fill with '0' characters */ }
    else                         { /* count = dataIn[lnI] - 126, fill with spaces */ }

    // Populate the fill portion

} else {
    // Copy characters by the dataIn[lnI] count
}

--
Rick C. Hodgin
 
We have received full folders of their programs and data. I only see compiled COB files, but we do have access to a copy of their system and data. I should be able to begin accessing it starting this week.

--
Rick C. Hodgin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top