Opening and reading from a binary file larger than 2GB 3

rahreg · Mar 2, 2025

Hi All,

Long time VFP programmer, first time poster on this forum. I have a program that compares 2 files by opening each using fopen and reading a certain block size from each and compares them in a loop. This has been working fine for me for decades. Unfortunately I am starting to run into the problem of VFP 9.0 not being able to deal with files larger than 2GB this way. I've looked to see if there was some file object I could use to get around it but so far no luck.

I did see something suggesting CreateObject('Scripting.FileSystemObject') but that was for a text file. I tried it anyway and used OpenTextFile and ReadLine() but it didn't work. At some point the comparison was false even though the 2 files are identical.

I'm guessing the answer is no, but is there any object that can be used to low level open a file larger than 2GB and read data from it as blocks and not lines?

Thanks
rah

GriffMG · Mar 3, 2025

I think you have two choices, either find a way to split the files using a utility (64 bit presumably) or use VFPa
for this particular effort.

For splitting take a look here

How to split large text file in windows?

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

stackoverflow.com

Joe Crescenzi · Mar 3, 2025

While there's no built-in support for low level functions to do this, the solution will depend on how you want to manage the comparison.

If it's a text based file, you can run the FC (File Compare) program built-in to Windows, then send the output to a text file, then you can write something that parses the output file, instead of parsing the whole file.

If the files were similar, chances are the output from FC will be relatively small.

mJindrova · Mar 3, 2025

1) use ADODB.Stream - https://www.tek-tips.com/threads/open-file-as-unicode-utf-16-and-save-as-utf-8.1818112/ 2) use API - https://github.com/VFPX/Win32API/blob/master/samples/sample_346.md DECLARE LONG CreateFile IN kernel32.dll STRING @, INTEGER, INTEGER, INTEGER, INTEGER, INTEGER, LONG DECLARE INTEGER ReadFile IN kernel32.dll LONG hFile, STRING @ lpBuffer, INTEGER nNumberOfBytesToRead, INTEGER @ lpNumberOfBytesRead, INTEGER lpOverlapped DECLARE INTEGER CloseHandle IN kernel32.dll LONG

rahreg · Mar 3, 2025

GriffMG said:
I think you have two choices, either find a way to split the files using a utility (64 bit presumably) or use VFPa
for this particular effort.

For splitting take a look here

How to split large text file in windows?

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

stackoverflow.com

Thanks for the reply. I considered splitting but there might be issues with that I would rather avoid.

What is VFPa? I searched for that but other things came up that are not related to this.

wOOdy-Soft · Mar 3, 2025

Hi Rah,

The Scripting.FilesystemObject is basically using the same Windows API functions, which VFP also uses for it's LLFFs. The only difference is that VFP can only work with a 2Gb pointer. Therefor you can use it in the same way:

CSS:

x = CreateObject("scripting.filesystemobject")  && the FSO Object
y = x.GetFile(GetFile())   && pick your binary file and get a FileObject
z = y.OpenAsTextStream()   && get your  filecontent  // VFP FOPEN()

do while not z.AtEndOfStream  && like VFP FEOF()
 ? z.Read(100)    && get your content bytewise  // VFP FREAD()
 ? z.ReadLine()   && read full lines until CRLF // like VFP FGET()
 ? z.ReadAll()    && read the whole content // like VFP FileToStr() which is limited to 2GB
 ? z.skip(50)     && moves the pointer // caveat: no way to reposition at a specific position like BOF or EOF or any distance from there
enddo
z.close()

For more help see: https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/file-object
and https://learn.microsoft.com/en-us/o...e/user-interface-help/openastextstream-method

rahreg · Mar 3, 2025

Joe Crescenzi said:
While there's no built-in support for low level functions to do this, the solution will depend on how you want to manage the comparison.

If it's a text based file, you can run the FC (File Compare) program built-in to Windows, then send the output to a text file, then you can write something that parses the output file, instead of parsing the whole file.

If the files were similar, chances are the output from FC will be relatively small.

Thanks for the reply. These are video files. But I see there is a switch for binary files so I will experiment with that. I didn't know windows had a file comparison program so this seems like a good possibility.

rahreg · Mar 3, 2025

wOOdy-Soft said:
Hi Rah,

The Scripting.FilesystemObject is basically using the same Windows API functions, which VFP also uses for it's LLFFs. Therefor you can use it in the same way:

x = CreateObject("scripting.filesystemobject") && the FSO Object
y = x.GetFile(GetFile()) && pick your binary file and get a FileObject
z = y.OpenAsTextStream() && get your filecontent

do while not z.AtEndOfStream
? z.Read(100)
enddo

Hi Woody,

I actually saw that solution and tried it but it was with ReadLine() and for some reason the comparison didn't match. But Read() seems like it's more like what I'm currently doing and could work so I changed the 2 ReadLine()s to Read(), did some tests and it worked!!!!! I just let out a BIG sigh of relief.

THANKS!!
rah

tomk3 · Mar 3, 2025

wOOdy-Soft said:
Hi Rah,

The Scripting.FilesystemObject is basically using the same Windows API functions, which VFP also uses for it's LLFFs. The only difference is that VFP can only work with a 2Gb pointer. Therefor you can use it in the same way:

CSS:

x = CreateObject("scripting.filesystemobject") && the FSO Object y = x.GetFile(GetFile()) && pick your binary file and get a FileObject z = y.OpenAsTextStream() && get your filecontent // VFP FOPEN() do while not z.AtEndOfStream && like VFP FEOF() ? z.Read(100) && get your content bytewise // VFP FREAD() ? z.ReadLine() && read full lines until CRLF // like VFP FGET() ? z.ReadAll() && read the whole content // like VFP FileToStr() which is limited to 2GB ? z.skip(50) && moves the pointer // caveat: no way to reposition at a specific position like BOF or EOF or any distance from there enddo z.close()

For more help see: https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/file-object
and https://learn.microsoft.com/en-us/o...e/user-interface-help/openastextstream-method

Nice to see you wOOdy. Tom

Joe Crescenzi · Mar 3, 2025

rahreg said:
Thanks for the reply. These are video files. But I see there is a switch for binary files so I will experiment with that. I didn't know windows had a file comparison program so this seems like a good possibility.

Interesting. Now I'm super curious what you're doing with those comparisons in a database environment. Are you just checking to see IF they're different, and you don't necessarily need to know which bytes were changed? If that were the case, the file size alone would be all you would need to check.

I would think that once a video file is changed, knowing which bytes changed wouldn't be something you would track in a database system.

rahreg · Mar 3, 2025

Joe Crescenzi said:
Interesting. Now I'm super curious what you're doing with those comparisons in a database environment. Are you just checking to see IF they're different, and you don't necessarily need to know which bytes were changed? If that were the case, the file size alone would be all you would need to check.

I would think that once a video file is changed, knowing which bytes changed wouldn't be something you would track in a database system.

Hey Joe,

The program is basically to dedup files. I run it in a source directory structure against a target structure. It compares a file in the source against files of the same size in the target. If the contents are a match it deletes the file in the source structure and moves to the next file. This is done regardless of the target's name or location so as long as it finds a matching target it deletes the source.

The file list from each structure is loaded into cursors using ADIR (although that doesn't return the true file size if its > 2GB but I have a workaround for that). So there is a database component to it, but I use VFP for everything I program for myself. My catchphrase is, "Soooooo much easier in FoxPro."

Like I said I've used this for decades but the files have gotten bigger where the 2GB limit is now a problem. I'm currently using it in a situation where I'm consolidating files from small external hard drives to a larger one and organizing them better. I only want to delete the files on the smaller drives if there is confirmation they are on the larger one. This way if something gets deleted by accident it will still be on the smaller drive and I can take care of it. Since the file structure on the larger drive will be different, and the files could be renamed, my program won't care and will delete it on the smaller drive because it knows it's on the larger drive.

Speaking of ADIR, does anyone know if there is an API that could tell me if the file or directory is actually a junction or symbolic link? That would help for a different program to mirror drives. I was going to post a new thread to ask that but I figure I'll ask here as well.

Thanks
rah

Joe Crescenzi · Mar 3, 2025

"I use VFP for everything I program for myself. My catchphrase is, "Soooooo much easier in FoxPro."

I'm guilty of that too. I've coded in xBase since the original CP/M version of dBase II going back to around 1983, so it essentially became my primary language for all sorts of non-database things too. I'm sure reading huge files would be a lot quicker in other languages, but for what it's worth, some FoxPro features are surprisingly fast, especially string manipulation.

rahreg · Mar 3, 2025

Joe Crescenzi said:
"I use VFP for everything I program for myself. My catchphrase is, "Soooooo much easier in FoxPro."

I'm guilty of that too. I've coded in xBase since the original CP/M version of dBase II going back to around 1983, so it essentially became my primary language for all sorts of non-database things too. I'm sure reading huge files would be a lot quicker in other languages, but for what it's worth, some FoxPro features are surprisingly fast, especially string manipulation.

Yeah. The only thing more surprising to me that new languages are still being created is how complicated they make a lot of things instead of making it simple and straightforward as xBase. And I've done a lot of string manipulation programs both personally and professionally and, well to use my catchphrase, it's soooooo much easier in FoxPro.

In the early 90s I wrote a program in Turbo Pascal. The first thing I did was build a library of functions to mimic FoxBase commands and functions so it was like programming FoxBase in Turbo Pascal. My favorite was making functions for @ SAY and @ GET. Ahhhh those were the days

Joe Crescenzi · Mar 3, 2025

It bugged me for a while that they discontinued the language when they shifted to .Net, especially since they didn't really introduce an alternative for creating data centric language for desktop apps, which to me was a huge gap in day to day operations in most businesses.

Back in the 80s, just about every business could be improved by building a database driven app and xBase was the go-to way to get things done.

I looked for alternatives but eventually just resolved myself to be comfortable knowing that if nothing else, the fact that it HASN'T changed since 2009 is actually it's strongest feature. I've been burned more times than I can count by all the constant changes to their .Net platform that require countless NuGet libraries that break your code when they're updated.

Code I wrote in xBase in the 80s still runs. Code I wrote 3 years ago in .Net will give me countless errors when I allow Visual Studio to update the libraries or make changes to use newer libraries. It can drive me nuts.

mmerlinn · Mar 3, 2025

Joe Crescenzi said:
I looked for alternatives but eventually just resolved myself to be comfortable knowing that if nothing else, the fact that it HASN'T changed since 2009 is actually it's strongest feature.

Code I wrote in xBase in the 80s still runs.

One of the main reasons I still use FPM 2.6 even though it is ancient. Just wish I could use it on modern equipment.

rahreg · Mar 3, 2025

Joe Crescenzi said:
It bugged me for a while that they discontinued the language when they shifted to .Net, especially since they didn't really introduce an alternative for creating data centric language for desktop apps, which to me was a huge gap in day to day operations in most businesses.

Back in the 80s, just about every business could be improved by building a database driven app and xBase was the go-to way to get things done.

I looked for alternatives but eventually just resolved myself to be comfortable knowing that if nothing else, the fact that it HASN'T changed since 2009 is actually it's strongest feature. I've been burned more times than I can count by all the constant changes to their .Net platform that require countless NuGet libraries that break your code when they're updated.

Code I wrote in xBase in the 80s still runs. Code I wrote 3 years ago in .Net will give me countless errors when I allow Visual Studio to update the libraries or make changes to use newer libraries. It can drive me nuts.

Yeah I always get nervous about upgrades. It's one of the reasons I still prefer to use Windows 7. Windows 10 does things I don't like, such as decide to reboot when I'm running a long process. But I think Fox upgrades were usually reliable.

Actually I still use VFP 6.0. I only opened the package and installed 9.0 about a year ago because of the additional parameter for ADIR to have it return the actual case of a file. Then I found out how to get around that in 6.0 so I still mostly use that.

Now if I can find a way to identify Junctions and Symbolic Links in VFP I can write a more versatile version of ADIR. I still have to start that other thread.

Chriss Miller · Mar 3, 2025

Just look into http://hexcentral.blogspot.com/2013/05/filesystemobject-performance-issues.html

I tested a file comparison with FSO and Windows API ReadFile and both are equally slow, you may test for yourself:

Code:

Function CompareFiles(tcFilename1, tcFilename2)
   #Define GENERIC_READ             0x80000000
   #Define OPEN_EXISTING                     3
   #Define FILE_ATTRIBUTE_NORMAL          0x70
   #Define BLOCK 8192

   Declares() && may also only do them once in main.prg or elsewhere you do API declares

   Local lcFilename1, lcFilename2, lnHandle1, lnHandle2, lcBuffer1, lcBuffer2,;
      lnFileSize1, lnFileSize2, lnBytesRead1, lnBytesRead2, llFilesEqual

   If (Left(tcFilename1,2)=="\\" Or Substr(tcFilename1,2,1)=":")
      lcFilename1 = tcFilename1
   Else
      lcFilename1 = Sys(5)+Sys(2003)+"\"+tcFilename1
   Endif

   If (Left(tcFilename2,2)=='\\' Or Substr(tcFilename2,2,1)=':')
      lcFilename2 = tcFilename2
   Else
      lcFilename2 = Sys(5)+Sys(2003)+'\'+tcFilename2
   Endif

   lnHandle1 = CreateFile(lcFilename1, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0)
   lnHandle2 = CreateFile(lcFilename2, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0)

   llFilesEqual = .T.
   lnFileSize1 = -1
   lnFileSize2 = -2
   GetFileSize(lnHandle1,@lnFileSize1)
   GetFileSize(lnHandle2,@lnFileSize2)

   llFilesEqual = (lnFileSize1=lnFileSize2)
   If Not llFilesEqual
      Return .F.
   Endif

   Store 0 To lnBytesRead1,lnBytesRead2
   Store BLOCK TO lnBytesToRead1, lnBytesToRead2
   Store Space(BLOCK) To lcBuffer1, lcBuffer2

   Do While lnBytesToRead1+lnBytesToRead2>0
      ReadFile(lnHandle1,@lcBuffer1,lnBytesToRead1,@lnBytesToRead1,0)
      If lnBytesToRead1 < BLOCK
         lcBuffer1 = Left(lcBuffer1,lnBytesToRead1)
      EndIf
      lnBytesRead1 = lnBytesRead1 + lnBytesToRead1
   
      ReadFile(lnHandle2,@lcBuffer2,lnBytesToRead2,@lnBytesToRead2,0)
      If lnBytesToRead2 < BLOCK
         lcBuffer2 = Left(lcBuffer2,lnBytesToRead2)
      Endif
      lnBytesRead2 = lnBytesRead2 + lnBytesToRead2

      llFilesEqual = llFilesEqual And (lcBuffer1==lcBuffer2)
      If Not llFilesEqual
         Exit
      Endif
   EndDo
   CloseHandle(lnHandle1)
   CloseHandle(lnHandle2)

   llFilesEqual = llFilesEqual AND (lnBytesRead1=lnBytesRead2)
 
   Return llFilesEqual
EndFunc

Procedure Declares()
   Declare Integer CreateFile In kernel32;
      STRING  lpFileName,;
      INTEGER dwDesiredAccess,;
      INTEGER dwShareMode,;
      INTEGER lpSecurityAttributes,;
      INTEGER dwCreationDisposition,;
      INTEGER dwFlagsAndAttributes,;
      INTEGER hTemplateFile

   Declare Integer GetFileSize In kernel32;
      INTEGER   hFile,;
      INTEGER @ lpFileSizeHigh

   Declare Integer ReadFile In kernel32;
      INTEGER   hFile,;
      STRING  @ lpBuffer,;
      INTEGER   nNumberOfBytesToRead,;
      INTEGER @ lpNumberOfBytesRead,;
      INTEGER   lpOverlapped

   Declare Integer CloseHandle In kernel32;
      INTEGER hObject

Compare using FSO:

Code:

Function CompareFilesFSO(tcFilename1, tcFilename2)
   Local lcFilename1, lcFilename2, loFile1, loFile2, loStream1, loStream2, lcBuffer1, lcBuffer2,;
      lnFileSize1, lnFileSize2, lnBytesRead1, lnBytesRead2, llFilesEqual, loFSO

   If (Left(tcFilename1,2)=="\\" Or Substr(tcFilename1,2,1)=":")
      lcFilename1 = tcFilename1
   Else
      lcFilename1 = Sys(5)+Sys(2003)+"\"+tcFilename1
   Endif

   If (Left(tcFilename2,2)=='\\' Or Substr(tcFilename2,2,1)=':')
      lcFilename2 = tcFilename2
   Else
      lcFilename2 = Sys(5)+Sys(2003)+'\'+tcFilename2
   EndIf
 
   loFSO = CreateObject("scripting.filesystemobject")  && the FSO Object
   loFile1 = loFSO.GetFile(lcFilename1)
   loStream1 = loFile1.OpenAsTextStream()
   loFile2 = loFSO.GetFile(lcFilename2)
   loStream2 = loFile2.OpenAsTextStream()

   llFilesEqual = .T.
   Store 0 To lnBytesRead1,lnBytesRead2
   Store Space(BLOCK) To lcBuffer1, lcBuffer2

   Do While Not (loStream1.AtEndOfStream AND loStream2.AtEndOfStream)
      lcBuffer1 = loStream1.Read(BLOCK)
      lnBytesRead1 = lnBytesRead1 + Len(lcBuffer1)

      lcBuffer2 = loStream2.Read(BLOCK)
      lnBytesRead2 = lnBytesRead2 + Len(lcBuffer2)
   
      llFilesEqual = llFilesEqual And (lcBuffer1==lcBuffer2)
      If Not llFilesEqual
         Exit
      Endif
   EndDo
   loStream1.Close()
   loStream2.Close()

   llFilesEqual = llFilesEqual AND (lnBytesRead1=lnBytesRead2)
   Return llFilesEqual

Code is not complete in the aspect of checking whether the to be compared files exist, but that shouldn't be a problem when you process list of files actually existing as found by ADIR. Just notice a file name is extended by curent drive and directory, if that's not part of the filename passed in, ADIR will only contain file names, not paths and so this still is to be adddressed to avoid problems. I merely concentrated on the code necessary for block by block comparisons.

But both of these functions also will work in VFP9 32bit, not only in VFPA 64bit. You still can only have 2GB RAM for your process, but you only need a blocksize like 8KB.

Chriss Miller · Mar 3, 2025

If you talk about file shares on file servers (and similar) also take a look at this:

Data Deduplication Overview

Learn more about: Data Deduplication Overview

learn.microsoft.com

I realize your scenario is about generations of external drives. At least that's how I'd describe it in short by what you describe with smaller (presumably older) and larger (presumably recent) drives stroring things like backups.

Well, it's not totally off: The deduplicaiton that's built into Windows server also addresses backups. But it's clearly not about storing them on generations of external drives. And I'm surely understanding why this would be done for many reasons, too, like backups detached from the running servers not being prone to malware attacks, etc. etc.

I also wrote some routines about file deduplications and I worked with the checksums of files, precomputed at best, so no time taken to compute them per file, but also computing them just for the first block of a file and only going into detailed file comparisons where file size AND first block checksums match. That means much more sparse comparisons.

Crox · Mar 3, 2025

2GB is a general max size in 16 bit environments. You can make a program in GNU COBOL, I expect it to run.

You can download the SPF365 editing utility for free. It is able to edit very large files. It can also compare files and directories.

Perhaps a chopfile program can help you: https://www.acapsoft.com/det.php?prog=Chop

Let us know what worked for you and what not.

rahreg · Mar 4, 2025

Chriss Miller said:
Just look into http://hexcentral.blogspot.com/2013/05/filesystemobject-performance-issues.html

I tested a file comparison with FSO and Windows API ReadFile and both are equally slow, you may test for yourself:

Hi Chriss,

I was debating using the old code that uses VFP functions if the file was smaller than 2GB. After reading the article you sent I decided to make it more intelligent and it uses VFP < 2GB and FSO > 2GB. I tested it with a 747MB file. VFP took 20 seconds, FSO took 2 minutes and 6 seconds.

So thank you for pointing that out so I could make that change before I started running it and it would have taken longer for the < 2GB files.

rah

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Opening and reading from a binary file larger than 2GB 3

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor