Loading Some Fields From a Tab Delimited Text File

FoxNovice · Jul 31, 2008

Hi! I have a very large tab-delimited text file, and in order to stay under the 2GB limit I would like to be able to selectively load - on the fly - only some of the fields into a VFP structure. For example, if the full text record contains 100 data elements (fields), I would like to only load element 1, 3, 5, 10, 15 and 100. How can I do this without first creating the full .dbf record and then copying out only those fields I'd like to keep??
Thanks!
John

JRB-Bldr · Jul 31, 2008

Since you want to import non-sequential field values, some of the 'normal' means will not work.

One way that I have done is to Import LARGE text files into a SQL Server table and then use DTS to Export only those fields that I need. I typically export back into a new Text file and then either APPEND FROM <textfile> or use the Import Wizard (much slower) to get the new 'records' into a VFP table.

Good Luck,
JRB-Bldr

Mike Lewis · Aug 1, 2008

Hi John,

One way to do this would be to use VFP's low-level file functions to break the text file into two or three smaller portions. Check the Help for details of FOPEN(), FREAD(), FWRITE(), etc.

Then, import each of the smaller portions in turn into a temporary table, using APPEND FROM ....DELIMITED WITH TAB.

For each temporary table in turn, copy to another table, using COPY TO ... FIELDS .... This will allow you to select the fields to be copied.

You now have two or three new tables, each containing just the fields you want, and each containing a half or a third of the total records. You can now use APPEND FROM to combine them into a single file.

It sounds long-winded, but off-hand I can't think of anything easier.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk

JRB-Bldr · Aug 1, 2008

I guess that the big question is if this will be a one-time or only occasional file import requirement or will it be needed regularly on an on-going basis (such as daily). That indeed may influence which approach to take.

If this is a one-time effort or an occasional effort where doing some of the work manually is acceptable then, in addition to the SQL Server approach that I mentioned above, another tool you might look into is a file/text editor which can handle LARGE files. With it you can break the text file into more manageable portions and work with them in the normal manner. One tool that I discovered is the "Programmers File Editor" (

http://www.lancs.ac.uk/staff/steveb/cpaap/pfe/)

which I have no business ties to, but which has helped me out more than once.

If this is to be a frequent need, then Mike's suggestion above to create a fully automated method would be best.

Good Luck,
JRB-Bldr

Olaf Doschke · Aug 1, 2008

using fopen() and fgets() is a good idea on such large files, but will only work, if the txt file is smaller than 2GB, as VFPs low level file functions are limited too.

The help does not explicitly say something about a file size limit for the Append From command, but I think it would also not work, because the 2 gig limit seems a rather global file size limit for VFP.

And if the source file is lower than 2 gig the dbf should not grow that much if field length are optimized for the data.

Well, if source files are not that large or if APPEND FROM nevertheless works with large input files you can use a trick: define a C(1) field called cDummy and make use of the FIELDS clause.

Code:

APPEND FROM .... FIELDS cField1, cDummy, cField2, cDummy, ...

would for example import the first value of a txt file row to field1, the second value to the dummy field, the third value to field2, the fourth value to cDummy and so on, so you would quasi throw away the second and fourth column of the txt file.

You'd afterwards only need to remove that one cDummy field from the import cursor.

Bye, Olaf.

FoxNovice · Aug 1, 2008

All - thanks so much for you quick replies! A few notes: this will be a regular, on-going job that will be run at least once or twice a week. Unfortunately, we do not have SQL Server available to us. Each data record in the text file has the fields delimited by a tab, and a CR/LF end of record sequence.

There are approx. 180 data fields for the full record, with a record length around 3000 bytes. The resulting "small version" of the file would be used to generate counts reports. Some field manipulation and record reordering would be required for the report.

I was hoping to run this entirely within VFP using VFP commands. I envisioned possibly using string or array functions to read in a text record one at a time (using the CR/LF location?), keeping track of the fields needed by keeping track of the tab delimiters, and writing out only those fields I needed for the report file.

Can this be done? I have done a similar thing with fixed length text files, but not with tab delimited files.

Thanks!
John

JRB-Bldr · Aug 1, 2008

If you are doing this file importing as frequently as you say, then it would certainly be preferable if you can do it entirely within VFP. But, as Olaf says, you may indeed face the 2GB file size limit no matter what.

Since you say that the data 'fields' that you want are not immediately sequential from the first field and on, then the APPEND FROM... will not meet your needs.

Assuming that the file size will indeed work with VFP's FOPEN()/FGETS(), then that would be the preferable way to go. You can indeed parse out each 'record' separately and the TAB delimited is no problem, you merely look for it as you parse out each field's content.

The text file editor that I mentioned is also free.

Or if you need SQL Server you can download the free version from

http://www.microsoft.com/sql/editions/express/default.mspx.

You might look into using VFP to automate SQL Server operations through calls to Stored Procedures and then follow up with VFP operations on the result.

Good Luck,
JRB-Bldr

Mike Lewis · Aug 1, 2008

John,

I might be wrong about this, but I believe that the 2 GB limit won't be a problem if you use low-level file access to get one record at a time. If that's right, the approach you described sounds good. Obviously, once you've got a record into a memory variable, it should be reasonably easy to parse it into separate fields which you can then insert into a table.

I hesitate to disagree with JRB-Bldr (I don't usually), but I wouldn't go out of my way to use SQL Server. Nothing against that product (I use it a lot myself), but it seems like overkill to use it merely to solve this one problem.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk

JRB-Bldr · Aug 1, 2008

Mike, I am in full agreement with you.

It might be a case of using a sledgehammer when there could be the possibility of being able to use a tack hammer.

But until it is determined that the 2GB limit is not an issue, the SQL Server approach is merely offered as one possible avenue towards a solution - and one that could be 'driven' in an automated manner from VFP.

JRB-Bldr

Dan Freeman · Aug 1, 2008

I believe the 2GB limit is universal.

Mike Lewis · Aug 2, 2008

I stand corrected about the 2 GB limit in this case. I just tried to FOPEN() a 3.99 GB text file. I got an "invalid seek offset" error straight away.

Another solution might be to use something other than VFP to write a little program to break up the text file into smaller chunks. I'm reluctant to suggest a non-FoxPro solution, but, if this was me, I would write a C# program to do that, which I would than call from the main VFP routine.

John, as an alternative, can you attack this from the other direction, that is, change the process or application that generates the text file so that it only gives you the fields you want?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk

baltman · Aug 2, 2008

Use low level file read. If more than 2 GB see faq184-4732. Then use getwordnum() to grab the fields you want and write them to a new file or insert directly into a dbf.

This also works well to split a file when it contains more than 254 columns.

Brian

NasibKalsi · Aug 4, 2008

I think, SQL Server Express 2005 database limit is 4GB.

Mike Lewis · Aug 4, 2008

Brian,

Your idea of using the file scripting object is a good one. Better than my suggestion of using an external program, if only because it keeps everything in VFP.

John needs to keep in mind that users sometimes disable scripting. But presumably that won't be an issue on a system under his own control.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk

FoxNovice · Aug 7, 2008

Thanks for all your input - some great ideas! The largest of these tab-delimited text files we have received so far has been 2.4GB (which does truncate at 2GB using VFP commands), but they are typically under 2GB. We cannot change the format or record layout of the text files because they are supplied to us by the client, and they are not willing to change anything (except vendors!).

We have been "grunting" the recent jobs through the shop because they are fairly managable in size. We have been doing this by loading the text files into a temporary structure that is shortened as much as possible, and is filled with lots of 1-byte fields (the fields that are not needed), and then copying out only the fields required for the report.

But we would like to do this in a more sophisticated manner. I have been taken off this project to get involved with more pressing issues, and don't have much time to work on this one.

We have a monthly file coming in soon and it will be pretty big (close to the 2GB size).

Assuming the monthly file will be under 2GB, can anyone supply some "skeleton code" that might fit our needs (opening the text file, and writing out to a .dbf only those fields we would need)?? We have not done this with a tab delimited file before!

Thanks!

John

baltman · Aug 7, 2008

The makes a 3 field sample file and then puts fields 1 and 3 into a table. It should give you an idea. Not promising FSO is faster than your DBF solution, but it is more versatile.

Code:

TEXT TO m.cFileText TEXTMERGE NOSHOW PRETEXT 7
this is a char1 <<CHR(9)>> 222	<<CHR(9)>> 12/31/1999
this is a char2 <<CHR(9)>> 999	<<CHR(9)>> 12/31/2008
ENDTEXT 

STRTOFILE(m.cFileText, 'sample.tab')

CREATE TABLE sample2fields (charfield c(20), datefield d)

fso = CreateObject("Scripting.FileSystemObject")
    ExistingFile= fso.OpenTextFile("sample.tab", 1,.f.)
 
DO While ExistingFile.AtEndOfStream=.f.
   m.cString = ExistingFile.ReadLine()
			
  INSERT INTO sample2fields VALUES ;
    (GETWORDNUM(m.cString, 1, CHR(9)), ; 
    CTOD(GETWORDNUM(m.cString, 3, CHR(9))))

ENDDO
ExistingFile.Close

LOCATE 
BROWSE NOWAIT

baltman · Aug 7, 2008

As an aside I often re-write large file I am provided to strip out characters that add no value. For example the time portion of datetime exports where there is no time or ",0.00," when either ",0," or ",," would mean the same thing to VFP. Or if there are long descriptions that can be coded... all this things are fastest done with low level file read/writes before you load the data into VFP.

It does take some effort but it is worthwhile for large and repetitive data sets.

baltman · Aug 7, 2008

One last thing... for super large data sets I think loading into SQL Server 2005/8 is definitely the way to go. I have a program that replicates a DBF structure (built with a data sample) into a create table command via SQL pass through and then writes a def file and uses bcp to load the original files. I was able to create the data layout from scratch using my custom import tools and then load over 75 million records w/100+ fields into SQL Server in under 15 minutes.

As long as you are careful with HOW to tell SQL Server to handle data it is pretty fast. But for some things it is AMAZINGLY slow.

fluppe689 · Aug 8, 2008

Hello Foxnovice,

I made a program to import +- 450000 records in verty little time. This textfile is +- 17 mb great.

this is my code

CLOSE DATABASES
SET ESCAPE ON
SET SAFETY ON

*
LOCAL gnFileHandle,nSize,cString, m.tel
m.tel=0

*
SELECT 0
USE c:\develop\autoparts\filip\toybel EXCLUSIVE
SELECT toybel
ZAP

gnFileHandle = FOPEN("c:\develop\autoparts\filip\price")
nSize = FSEEK(gnFileHandle, 0, 2)
IF nSize <= 0
WAIT WINDOW "This file is empty!" NOWAIT
RETURN
ELSE
= FSEEK(gnFileHandle, 0, 0)
*
m.flag1 = 0
m.tel = 0
m.teller = 0
*
DO WHILE m.flag1 < nSize
cString = FREAD(gnFileHandle, 130)
SELECT toybel
APPEND BLANK
REPLACE art WITH ALLTRIM(SUBSTR(cstring,1,12))
REPLACE price WITH VAL(SUBSTR(cstring,13,8))
REPLACE something WITH SUBSTR(cstring,21,10)
REPLACE descript WITH SUBSTR(cstring,31,20)
m.tel = m.tel+1
m.teller=m.teller + 50
m.flag1 = m.flag1 + 50
WAIT WINDOW 'Tel : ' + STR(m.tel,6,0) + ' ' + STR(m.teller,9,0) nowait
= FSEEK(gnFileHandle, m.flag1)
ENDDO
ENDIF
*
= FCLOSE(gnFileHandle)

The total time for import into the dbf file = 10 minutes

wfg,
FILIP

Olaf Doschke · Aug 8, 2008

Hi FILIP,

ouch, append blank and multiple replaces plus a wait window. No wonder it takes that long.

Code:

   ...

        INSERT INTO toybel;
        (art, price, something, descript) ;
        Values;
        (ALLTRIM(SUBSTR(cstring,1,12)),;
        VAL(SUBSTR(cstring,13,8)),;
        SUBSTR(cstring,21,10),;
        SUBSTR(cstring,31,20))

   if m.tel%1000 = 0
      WAIT WINDOW 'Tel : ' + STR(m.tel,6,0) + '   ' + STR(m.teller,9,0) nowait
   endif 
   
   ...

But all this is rather specialised to some nondelimited but fixed column width format.

Bye, Olaf.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Loading Some Fields From a Tab Delimited Text File

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Technical User

Technical User

Technical User

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor