Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extracting Data from COBOL produced .DAT file

Status
Not open for further replies.

doktord

Technical User
Sep 26, 2006
24
US
Hey guys,

I'm working on a program to run in conjunction with some software that was written in COBOL. The software produces and stores information in .DAT and .KEY files located on a server.

I need to access the data stored in these files to use with a program I'm writing in VB.

I've been searching around the forums and there really aren't too many threads out there about decrypting .DAT files and even fewer are able to successfully do it.

I've tried just simply opening the .DAT files in notepad and all I get are random numbers and symbols. I also tried opening the files in a free trial of DataViewer, which worked to a certain extent but still butchered some of the data.

I can provide pretty much any information that would be helpful in figuring this out. I can provide files to tinker with, some small programs, and I'm waiting to hear back from the software people as to what compiler was used to compile the COBOL program that makes the .DAT files.

I really appreciate your time and patience. I'm just a moderate level (if that) hobby programmer trying to get this program to work in conjunction with this software.
 
Sounds as though you may have a VSAM file. Try searching for any interfaces for that. Perhaps the simplest way is just to create another COBOL program to extract the data you want - or change the original program to output another file with the data you want.


Nic
 
I did see that thread (which is what lead me to this forum). The only thing is, it is pretty far over my head. How would I use the hex editor to figure out how to decode the information?
 
The "hex editor" would allow viewing of the actual content rather than the "random numbers and symbols". If you post the first few records in hex, someone here may recognize what kind of records these are.

The hex editor would not "decode" these (that is a completely different exercise), but could be used to possibly identify exactly what they are. Sometimes a .dat file is simply data other times it is some proprietary or other custom format.

Can you get a copy of the cobol source that created the file (or at least the data layout(s))?
 
I should be able to get a copy of the COBOL source, I'll email the software guys and I should have it probably Monday (don't think they'll be there over the weekend).

As far as the hex records go, I'll post what I can.. Not sure which part you would like to be posted.


20 20 20 20 20 20 20 48 49 47 48 20 50 4F 49 4E
54 20 20 20 20 20 20 20 4E 43 20 32 37 32 36 34
20 20 20 20 20 02 00 70 10 5F 02 00 70 10 5F 54
57 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 00 E6 30 30 30 30 31 32 20 20 4D 41
49 4C 56 47 41 54 45 57 41 59 20 50 41 49 4E 54
20 26 20 43 48 45 4D 49 43 41 4C 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20


This came from a file that contains vendor addresses. If that's not what you want copied and pastes, let me know.

Thanks!
 
Yes, thst's it. Suggest you bookmark this link - i believe you will find it handy. . .

This:
20 20 20 20 20 20 20 48 49 47 48 20 50 4F 49 4E
54
is ' HIGHPOINT' (7 spaces followed by the word).

You can "decode" more by using the table from the link. Be aware that there are also embedded control characters (values below x'20').
 
The only thing is, it is pretty far over my head. How would I use the hex editor to figure out how to decode the information?

You'd have to use some knowledge of data storage to figure out what the record definition is, then experiment with it and see if you can read the data properly.

Assuming this file wasn't created on a mainframe environment (it looks like it wasn't), you try to decipher the hex values until you get what looks like repeating data, and then test the record layout in your program until you get it right.

Of course, having the record layout from the COBOL program that created this will be useful too. Be aware, you might find some wierd types supported in COBOL but not in VB that you might have to convert.

This will help for character values if you don't already have it handy.

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
The "02 00 70 10 5F" which is repeated in the example could be 020070105 packed decimal, with a possible assumed decimal point somewhere in there.
 
Any guess is wrong by default.
Get the original record definitions, and transform the data is was supposed to be.
The only sensible solution presented above was to use a cobol pgm to extract the data into some generic format (plain text?) to make it readable by VB or any other development language.
 
You are writing in VB. What we don't know is (1) operating system/hardware on which the data reside, (2) connectivity you have to that system, and (3) identification of the COBOL compiler being used. Can you supply this information, please?

Tom Morrison
Micro Focus
 
Thanks for the replies, guys.

I'm going to be running it on XP and the server that holds the files is shown on the computer as another disk drive (S:). If push comes to shove (although I'd like to not do it this way) I could probably stumble my way through making a COBOL program to convert to a .CSV file or something to read into VB. (LAST resort).
 
The COBOL program to convert to all text input is the ideal for mainframe situations. However, since this appears to be on the PC by the hex values you posted (there are clearly ASCII character values present), it would be possible to write something with VB alone if you have the record layout.

Having access to the COBOL source that generated this file will produce this record layout which can be translated for use in VB. This would be ideal.

The questions that Mr. Morrison asked will go a long away to help determine what we can do to aid you here.

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
I'm working on getting the COBOL source code as we speak, so hopefully I will have it in a couple days at the latest.
 
This is the reply I got from the software people regarding the files I am most interested in using. We're working on a non-compete agreement that will give me more access to the code itself:



"Those two dat files are correct. Also look at the ITEMSTF.dat (item store file). This may provide you with some more information...
From ITEMFL.ws (grabbing useful bits of code)
03 ITEM-STAT PIC XX.
03 ITEM-FILE-NAME PIC X(8) VALUE "ITEMFL".
03 ITEM-NAME PIC X(30) VALUE SPACES.
03 ITEM-OPEN-SW PIC X VALUE SPACES.
03 ITEM-LOOKUP-KEY.
05 ILK-DEPT PIC XX.
05 ILK-CLASS PIC XXX.
05 ILK-FNLN PIC X(4).
05 ILK-ITEM-NO PIC X(15).
03 ITEM-VNDR-ALTKEY.
05 IVAK-VNDR-NO PIC X(6).
05 IVAK-VNDR-SUB-NO PIC XX.
05 IVAK-ITEM-NO PIC X(15).

ITEMFL.fd

FD ITEM-FILE
LABEL RECORDS STANDARD.

01 ITEM-RCD.
03 ITEM-KEY.
05 ITEM-NO PIC X(15).

03 ITEM-MFG-NO PIC X(15).
03 ITEM-DESC PIC X(30).

03 ITEM-DCF.
05 ITEM-DEPT PIC XX.
05 ITEM-CLASS PIC XXX.
05 ITEM-FNLN PIC X(4).

03 ITEM-STATUS PIC X.
88 ITEM-ACTIVE VALUE "A".
88 ITEM-DISCONTINUED VALUE "D" "C".

03 ITEM-FLAGS.
05 ITEM-NON-TAX-FLAG PIC X.
05 ITEM-NON-RETURN PIC X.
05 ITEM-NON-DISCOUNT PIC X.
05 ITEM-NET-ITEM PIC X.
05 ITEM-GROUP-ITEM PIC X.

03 ITEM-CORP-CONTROL.
05 ICC-BUY-FLAG PIC X.
88 ICC-CORP-BUY-ITEM VALUE "Y".
05 ICC-CORP-BUYER-ID PIC XXXX.
05 ICC-CURR-VNDR PIC X(6).
05 ICC-CURR-VNDR-SUB PIC XX.

05 ICC-PRICE-FLAG PIC X.
88 ICC-CORP-PRICED-ITEM VALUE "Y".

05 ICC-COSTS.
07 ICC-CURR-COST PIC 9(4)V999 COMP-3.
07 ICC-AVG-COST PIC 9(4)V999 COMP-3.
05 ICC-RETAILS.
07 ICC-MFG-SUGGRTL PIC 9(4)V99.
07 ICC-UNIT-MEASURE PIC X(4).
07 ICC-MATRIX-NO PIC XX.
07 ICC-RETAIL1 PIC 9(4)V99.
07 ICC-RETAIL2 PIC 9(4)V99.
07 ICC-RETAIL3 PIC 9(4)V99.
07 ICC-RETAIL4 PIC 9(4)V99.
07 ICC-RETAIL5 PIC 9(4)V99.
05 ICC-PRICE-BREAK-QTY.
07 ICC-PB-QTY1 PIC 9(5) COMP-3.
07 ICC-PB-QTY2 PIC 9(5) COMP-3.
07 ICC-PB-QTY3 PIC 9(5) COMP-3.
07 ICC-PB-QTY4 PIC 9(5) COMP-3.
05 ICC-PRICE-CHANGE-DATE PIC 9(9) COMP-3.
05 ICC-MFG-VNDR-NO PIC X(6).
05 ICC-MFG-VNDR-SUB PIC XX.
05 ITEM-USER-CODE PIC X(6).
05 FILLER PIC X(5).

03 ITEM-SCALE-ITEM PIC X.
03 ITEM-DATE-ADDED PIC 9(9) COMP-3.
03 ITEM-DATE-CHANGED PIC 9(9) COMP-3.
03 ITEM-CHANGE-ID PIC X(7).
03 ITEM-RETURN-RECEIPT PIC X.
03 ITEM-RETURN-AUTH-ONLY PIC X.
03 ITEM-COMMENT-REQ PIC X.
03 ITEM-WEIGHT PIC 9(4)V999 COMP-3.
03 ITEM-FREIGHT-CLASS PIC XXX.
03 ITEM-HAZMAT-CLASS PIC X(7).
03 ITEM-PRICE-LEVEL PIC X.
03 ITEM-SERIALIZED PIC X.
03 ITEM-PRINT-INVOICE PIC X.
03 ITEM-DIMENSION-ITEM PIC X.
03 ITEM-NON-RETURN-DAMAGED PIC X.
03 ITEM-CAPTURE-SIGNATURE PIC X.
03 ITEM-EXTEND-DESC PIC X.
03 ITEM-CALC-RETAIL PIC X.
03 ITEM-LUMBER-ITEM PIC X.
03 ITEM-REQUIRES-QTY PIC X.
03 ITEM-PRINT-PACK-SLIP PIC X.

ITEMHI.ws

01 FILLER.
03 IH-STAT PIC XX.
03 IH-FILE-NAME PIC X(8) VALUE "ITEMHI".
03 IH-NAME PIC X(30) VALUE SPACES.
03 IHK-ITEM-STORE-PROM.
05 IHK-IS-ITEM-NO PIC X(15).
05 IHK-IS-STORE-NO PIC 999.
05 IHK-IS-PROM-ID PIC X(4).
05 IHK-IS-YEAR PIC 9(4).
05 IHK-IS-MONTH PIC 9(4).
05 IHK-IS-TYPE-RCD PIC X.
03 IHK-PROM-ITEM-STORE.
05 IHK-PIS-PROM-ID PIC X(4).
05 IHK-PIS-ITEM-NO PIC X(15).
05 IHK-PIS-STORE-NO PIC 999.
05 IHK-PIS-YEAR PIC 9(4).
05 IHK-PIS-MONTH PIC 9(4).
05 IHK-PIS-TYPE-RCD PIC X.
 
give me more access to the code itself:

All you need is the complete file definitions to know how to read the file.

What platform did the files come from when they got created? PC? Something else? If it's PC, you're pretty good to go for most part.

Anyway, these are typical record types, put the data together in order and you got the whole unit when you read the file.

Some data type descriptions (repeated because I can't find my last post on here where I did this):
Code:
05  ITEM-NO                       PIC X(15).
03  ITEM-STAT                   PIC XX.

The first one is a 15 byte table (array) of characters, whatever in it is left-justified. Read it as such and you'll be fine. You can also do the second one, and just put a number of X's representing what you want - the second one is a 2 byte array of characters. These will be easily used or converted as any kind of string type you want.

Code:
05  IHK-IS-STORE-NO         PIC 999.
07  ICC-RETAIL5               PIC 9(4)V99.

These are alphanumeric display data types. They are characters as well, and use the same rules as above, except only numeric characters can exist. The first is a 3 byte array of characters. The second is a 6 byte array of characters.

Whenever a V occurs, that is where the decimal point occurs in the number. However this is IMPLIED and not in the data type which means you'll need to know where to put the decimal point and place it in within your code when you work with this data. For example, 1234.56 is stored as "123456" in this data type. There are a few little variations in data representation based on stored sign (there would be an "S" somewhere in the definition if it existed), but you do not have to worry about it with what was posted.

Code:
03  ITEM-DATE-CHANGED         PIC 9(9)       COMP-3.
07  ICC-CURR-COST             PIC 9(4)V999   COMP-3.

This is a packed decimal data type. The V is an implied decimal point as before. However, the storage is much different.

Code:
MYDATA              PIC 9(5) COMP-3.

If I put 123 in the value above, it will store $00 $12 $3F.

The F means "no sign is stored." If signage is stored, you will see a C for "positive" and D for "negative".

The real data size of this value is:
1) Size of data value - if value is odd, add 1.
2) divide value in #1 by 2.

This is the one that will be rough if you try to read it in VB, since the type doesn't exist in most mainstream PC languages. If you're adventurous enough you can convert it (*), assuming this is a PC-generated file. If not, trying to work out something in COBOL will probably be best.

(*) - if I can think of it and there's the demand, I might post a DLL that does this sometime...

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
That was beyond helpful. Thank you so much for making more sense of that.

I'll haveta take a deeper look into COBOL programming to see which route I would be more capable of. A DLL would be absolutely amazing and save me quite a bit of time. Although, I do understand that if there isn't a larger demand of such a file that the work put into it might not be worth your time.

COBOL is a completely foreign language to me, so regardless of it's intensity, doing this in VB might be the option I'm looking at.
 
Although, I do understand that if there isn't a larger demand of such a file that the work put into it might not be worth your time.

We'll see. I already have PC code to go COMP-3 to text laying somewhere. If I were to release it, I just need to put it into a DLL and smoke-test it to make sure it's right in all cases and not the few ones I tested with it. I'm sure there will be other considerations, too.

But I'm sure having such a thing will definitely be handy to someone.

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
Code:
05  ITEM-NO                       PIC X(15).
03  ITEM-STAT                   PIC XX.

Neither of these are tables or arrays - they are simple string variables - in C etc they may be arrays but in COBOL, and other languages, they are simple variables. Tables, aka arrays, are repeating variables e.g.

Code:
05 ITEM-NO   PIC X(15) OCCURS 10 TIMES
is an array.


Nic
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top