A while back I posted about a tool I found to extract RM/COBOL data files. It doesn't work in all cases, and I've been working on writing a replacement for it. It also mostly works, but is clunky.
Does anybody have experience working with RM/COBOL files and the format of their internal structures? I've identified the header structure from the rm-decoder app above:
Between multiple records on a page, I see content like this:
I've figured out the codes there, like 0xd7,0x07 and 0x95,0x11 ... they indicate a code for how many characters to copy. The logic goes like this (in C):
At the end of a record following the last one on a page, I see content like this:
I can't figure out how to parse the portions between the encoded data blocks. If anyone has any help I'd be greatly appreciative.
--
Rick C. Hodgin
Does anybody have experience working with RM/COBOL files and the format of their internal structures? I've identified the header structure from the rm-decoder app above:
Code:
// u8=char (8-bit unsigned), u16=short (16-bit unsigned), the _be indicates big-endian format)
struct SRmFileHeader
{ // offset,length
u8 fill0; // 0,1
u8 page_id; // 1,1
u8 fill1; // 2,4
u8 signature[4]; // 6,4
u8 fill2[6]; // 10,6
u16 minRecord_be; // 16,2
u16 maxRecord_be; // 18,2
u8 fill3; // 20,1
u8 spaceCode; // 21,1
u8 numberCode; // 22,1
u8 compression; // 23,1
u8 keyNumberCode; // 24,1
u8 fill4; // 25,1
u16 blockSize; // 26,2
u16 blockIncrement_be; // 28,2
u8 blockContains; // 30,1
u8 fill5[13]; // 31,13
u16 indexBlocks_be; // 44,2
u8 fill6[6]; // 46,6
u16 numRecords_be; // 52,2
u8 integrityFlag; // 54,1
};
Between multiple records on a page, I see content like this:
Code:
0x00FADCD8 .. .. .. .. .. .. .. .. c1 03 35 30 7b c3 01 7b .[data].Á.50{Ã.{
0x00FADCE8 c3 01 7b c3 01 7b c3 01 7b c3 01 7b c3 01 7b 00 Ã.{Ã.{Ã.{Ã.{Ã.{.
0x00FADCF8 02 00 00 06 35 00 00 06 5f 00 00 06 69 00 00 06 ....5..._...i...
0x00FADD08 55 00 00 06 3c 00 00 06 4b 00 00 06 af 00 00 06 U...<...K...¯...
0x00FADD18 70 00 00 18 fa 00 00 16 79 00 00 0e ca 00 d7 07 p...ú...y...Ê.×.
0x00FADD28 .. .. .. .. .. .. .. 95 11 .. .. .. .. .. .. .. .[data]...[data]
I've figured out the codes there, like 0xd7,0x07 and 0x95,0x11 ... they indicate a code for how many characters to copy. The logic goes like this (in C):
Code:
// If it's encoded, then process it special
if (dataIn[lnI] > 127)
{
if (dataIn[lnI] > 231)
{
// Fill with the indicated character
lnFillCount = (u32)dataIn[lnI] - 230;
lcFillChar = dataIn[++lnI];
} else if (dataIn[lnI] > 207) {
// Fill with NULLs
lnFillCount = (u32)dataIn[lnI] - 210;
lcFillChar = 0;
} else if (dataIn[lnI] > 191) {
// Fill with '0' characters
lnFillCount = (u32)dataIn[lnI] - 190;
lcFillChar = '0';
} else {
// Fill with spaces
lnFillCount = (u32)dataIn[lnI] - 126;
lcFillChar = ' ';
}
// Populate the fill portion
for (lnCount = 0; lnCount < lnFillCount; ++lnCount)
dataOut[lnO++] = lcFillChar;
} else {
// Copy characters
lnCount = dataIn[lnI];
for (++lnI; lnCount > 0; --lnCount)
dataOut[lnO++] = dataIn[lnI++];
}
At the end of a record following the last one on a page, I see content like this:
Code:
0x00FADDD8 .. .. .. .. .. .. .. .. .. .. .. c1 03 35 30 7b ..[data]...Á.50{
0x00FADDE8 c1 03 35 30 7b c3 01 7b c3 01 7b c3 01 7b c3 01 Á.50{Ã.{Ã.{Ã.{Ã.
0x00FADDF8 7b c1 03 35 30 7b c3 01 7b c3 01 7b c3 01 7b c3 {Á.50{Ã.{Ã.{Ã.{Ã
0x00FADE08 01 7b c3 01 7b c3 01 7b 31 c0 01 7b 01 31 c0 01 .{Ã.{Ã.{1À.{.1À.
0x00FADE18 7b c3 01 7b c3 01 7b c3 01 7b c3 01 7b c0 01 31 {Ã.{Ã.{Ã.{Ã.{À.1
0x00FADE28 c0 01 7b 30 7b 30 7b 01 7b ff ff ff ff ff ff ff À.{0{0{.{ÿÿÿÿÿÿÿ
0x00FADE38 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x00FADE48 ff ff ff ff ff ff -- -- -- -- -- -- -- -- -- -- ÿÿÿÿÿÿ
I can't figure out how to parse the portions between the encoded data blocks. If anyone has any help I'd be greatly appreciative.
--
Rick C. Hodgin