How exactly is a random file structured?

tedsmith · Apr 4, 2015

Searching for details of how a random file differs from a non random I keep getting such silly answers like "A random file is one that can be accessed randomly"
Can anyone suggest a reference that describes byte for byte how a random file is constructed and why?

strongm · Apr 4, 2015

>"A random file is one that can be accessed randomly"

That's pretty much it. A random access file is one where you can move the file pointer to pretty much any random point for reading and/or writing data there.

tedsmith · Apr 4, 2015

THAT'S EXACTLY WHAT I MEANT!
I knew what they are for and how to use them 25 years ago.

strongm · Apr 4, 2015

Then ... why the question?

tedsmith · Apr 5, 2015

My question was not how a random file was described or used or what it is used for.

My question was how it was constructed that made it different to a non random file. I had never thought about it before, just accepting them blindly.

I have yet to find any explanation but I have concluded that it is the same except that all records in a random file must be the same length whereas in a non random text file, each record (if a record is deemed a line delimited by an unprintable character) can be any length as long as it ends in the delimiter.
A Binary file appears to be just a continuous stream of any of the possible 255 bytes - or is it?

Is the above correct or are there any other factors I am not aware of?

fredericofonseca · Apr 5, 2015

There isn't a random file as such. what you have is random access to file positions.

In order to work with Random access you need to define the record type on your file manager (which can be anything).

so imagine a file with 200 bytes.
if I define a record of 1 byte I can go "directly" from a logical point of view to record 20 - the file manager will do the inner working of determining what is the point in the file that corresponds to record 20 which in the case of a record of 1 byte is byte 20.

if for same file I define a logical record of 20 bytes, again I can go "directly" to record 20 - and now the filemanager will have determine that record 20 is the position on the file that starts at 1 + (19 * 20).

Lets look at another case which will be more common.

Indexed file (COBOL for example, or any ISAM type file)
These files will be made of data areas (can be on same physical file or on a separate one) and of a index and metadata areas.

Such files on modern implementations will also normally have compression on (although not mandatory).

So when we try to access any record randomly what the file manager needs to do is
read index area and determine what is the offset of the record we are trying to access
read data area for the offset found above
uncompress data
return record to application.

note that uncompress of data can be done prior to record position - but this would normally mean all of datafile would be preloaded in memory.

Note that based on the first example I gave what is normally called a line sequential file (e.g. one terminated by CR or CR+LF or LF alone), can also be read as single byte fixed record size thus making it a fized size file - fact it has a CR or a LF on it means nothing.

Even at fixed size records - for many many years that text files (e.g. non binary files) would use a TAB to replace spaces - if we look at this from a point of view if fixed size file, this would also be misleading without taking in consideration the fact that the filemanager could be configured to replace a TAB by a certain number of spaces before making the record/contents available to the calling application - and similarly when writing to the physical file it would replace spaces with a TAB as needed.

Regards

Frederico Fonseca
SysSoft Integrated Ltd

http://www.syssoft-int.com

FAQ219-2884
FAQ181-2886

tedsmith · Apr 5, 2015

Thanks. I am talking about are simple files that can be read by Input, Line Input or Get etc. so it would appear that my conclusion was correct.
While you say there isn't a "Random File" as such there are 250,000 references in Google to "Random files" and that's what I was referring to, those that I looked at assuming the reader already knew.
Unfortunately I couldn't find a real explanation so I put two and two together.

dilettante · Apr 5, 2015

The phrase is sort of contextual, and many confuse "radom" with "indexed" just to make things a bit more confusing.

For example in the context of many Microsoft Basics prior to VB.Net there is a "random file" type that treats UDTs as "records" and allows seeking by record number. The format of these persisted UDTs varies somewhat from their deserialized in-memory format, and most of the details of that are covered in the manuals. Indded, if you dig into the OLE documentation you'll find that the underlying data structure is called a "record" and is not really a C-style struct at all! See IRecordInfo Interface.

However for most other files Microsoft Basics treat them as streams of bytes, for text I/O statements the bytes are considered ANSI or DBCS characters. Usually that means 1 character = 1 byte (except when it means 2) so it is possible to seek but by character/byte instead of by record.

Most of that goes back to the weak filesystems supported in Windows and MS-DOS, that hearken back to the weak filesystems in old minicomputer OSs (Microsoft was crippled from the start by overexposure to frail operating systems, in particular those of the old PDP-10). For most purposes all of these OSs only support one file organization: stream files. That's because they barely supported disk at all, treating it as a collection of simulated punched paper tapes.

Once you leave the impoverished world of Windows, *nix, etc. things are different. Stream file support was fairly rare in the past even though it was often added by the 1990s, primarily to foster data interchange. There the major influence was punched cards rather than paper tapes.

Instead filesystems tend to offer record-oriented file organizations and often made a sharp distinction between sequential-only files and random access files. There is almost always some filesystem-level provision for various ISAM-style keyed/indexed file organizations as well. All of those come in lower overhead fixed length record and higher overhead variable length record file kinds.

The reason mainframe OSs tended to have so many file organizations, often conceptually the same, is that the business was competitive. To get business from IBM customers, Burroughs, Univac, Honeywell, etc. typically implemented IBM (and other) formats in addition to their own. This made it easier to convert a customer to your own products.

The reason you see so much chatter on the Web is that the Web is dominated by the bottom of the pyramid: the most people are at the lowest levels of skills and experience as well as least exposure to powerful filesystems. Since in the shallow end of the pool (PC OSs) any form of random access files must be an application level abstraction laid over the dumb stream file... constant jabbering about it is almost inevitable.

tedsmith · Apr 6, 2015

Yes, I admit I'm near the shallow end of the pool and afraid to go in the deep end so I try to adhere to the KISS principle!

On the other hand many say the world is being overtaken by jargon and reasoning that will be understandable only by a few and eventually only by a computer. Mankind's imagination and intelligence will die out because it won't be needed - back to the land of the apes ruled by a "artificial being" far greater than ourselves!

I think I'll take up sock knitting instead.

dilettante · Apr 6, 2015

It isn't that bad really. 99% of the time if your context is VB6 then a random access file means 1 of 2 things:

1. A file you open [tt]For Random[/tt] and address by record.

2. A file you open as a text or binary file and address by byte.

Probably more often (1) than (2). The addressing is done using [tt]Get[/tt], [tt]Put[/tt], and [tt]Seek[/tt] statements.

The description of the data in the file for the [tt]For Random[/tt] file organization is described under the [tt]Get[/tt] and [tt]Put[/tt] statements. These are not limited to use with UDTs, but that is probably more common than other data types.

tedsmith · Apr 6, 2015

Thanks for your efforts but your last post is exactly what was complaining about.
As I previously said, I have known what you just said for probably the last 25 years but never considered how (in simple terms) the underlying files was constructed.

I originally simply asked the latter simple question. Its easy if you already know the answer.
Don't worry, I worked it out well enough for myself as I said.

Can I explain it by an Analogy?
Confusing roadway direction signs and turning arrows are often made by people who already know the intersection layout intimately but can't visualse how a stranger entering for the first time could misread the signs even though they know how to drive perfectly.

This happens all the time in the software world too.

dilettante · Apr 6, 2015

As I said, the details are described in the documentation. For example see:

Get Statement and Put Statement

It goes into quite a bit of detail there. I'm not sure what you want besides that.

tedsmith · Apr 7, 2015

Grrr.
Thanks anyway.
I repeat, I never wanted to know how to USE Put and Get or HOW TO USE Random access files in any way.
I wanted to know what they were "made of". I already found out what I wanted.
Lets leave it at that.

dilettante · Apr 7, 2015

I don't understand the growling.

Those articles go into quite a bit of detail about what such files are "made of." For example one of the simpler cases:

If the variable being written is a variable-length string, Put writes a 2-byte descriptor containing the string length and then the variable. The record length specified by the Len clause in the Open statement must be at least 2 bytes greater than the actual length of the string.

and:

If the variable being read into is a variable-length string, Get reads a 2-byte descriptor containing the string length and then reads the data that goes into the variable. Therefore, the record length specified by the Len clause in the Open statement must be at least 2 bytes greater than the actual length of the string.

I have no idea what you wanted beyond those detailed descriptions.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How exactly is a random file structured?

tedsmith

Programmer

strongm

MIS

tedsmith

Programmer

strongm

MIS

tedsmith

Programmer

fredericofonseca

IS-IT--Management

tedsmith

Programmer

dilettante

MIS

tedsmith

Programmer

dilettante

MIS

tedsmith

Programmer

dilettante

MIS

tedsmith

Programmer

dilettante

MIS

Similar threads

Part and Inventory Search

Sponsor