Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to split a text file 3

Status
Not open for further replies.

Mike Gagnon

Programmer
Apr 6, 2002
8,067
0
36
CA
I need to split a text file in 2.
The text file contains about 110,000 lines and I need to split it in about half.
I looked at MEMLINES, ALINES, FILETOSTR. I'm out of ideas.
Any one have a quick and dirty way to do this?


Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Low-level file functions is the only thing I can think of Mike. You know the drill, Fopen(), FSeek() to get the size, FCreate(), FRead(), FWrite()...etc. Not quick and dirty but I don't see why it wouldn't work.

boyd.gif

 
Code:
lcFile = FILETOSTR(...)
lcFile1 = LEFT(lcFile,INT(LEN(lcFile)/2))
lcFile2 = SUBSTR(lcFile,INT(LEN(lcFile)/2)+1)
STRTOFILE(lcFile1,...)
STRTOFILE(lcFile2,...)

If the file is longer you need low level functions like FOPEN, FSEEK, FREED, FWRITE, FCLOSE. You can also limit memory usage with these, the above solution needs both double the file length in (virtual) memory and on disk.

Bye, Olaf.
 
Is this something you want to do programmatically anyway? As in if you just need to distribute the file as a one off there are a few 'file chopping' utilities out there.

Have a look at:


...for some utilities.

Neil

I like work. It fascinates me. I can sit and look at it for hours...
 
Craig

Thank you for the suggestions. I have a feeling this is the route I'll have to take.

Olaf

Thank you for the suggestion. After trying your suggestion, I end up with a partially chopped line in the first file and the rest of the line in the second file. Pretty close but I cannot lose that line nor can I accept a line that does not follow the structure of the rest of the file. And I don't think with FILETOSTR() you can actually control the last line content.




Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Ok, if you want a break up at an END OF LINE, you can count CHR(13)+CHR(10) and split up at one of those..

Code:
lcEndofLine = Chr(13)+Chr(10)
lcFile = FILETOSTR(...)
lnLines = OCCURS(lcEndofLine,lcFile)
lcFile1 = Left(lcFile,at(lcEndofLine,lcFile,Int(lnLines/2)+2))
lcFile2 = Substr(lcFile,at(lcEndofLine,lcFile,Int(lnLines/2)+2)+1)
...

Depending on the file you may try CHR(13) or CHR(10) alone.

Bye, Olaf.
 
you may replace +2 with +len(lcEndofLine) to make the code work more general...

Bye, Olaf.
 
actually it must be:

Code:
lcFile1 = Left(lcFile,at(lcEndofLine,lcFile,Int(lnLines/2))+Len(lcEndofLine)-1)
lcFile2 = Substr(lcFile,at(lcEndofLine,lcFile,Int(lnLines/2))+Len(lcEndofLine))

Sorry. These are the typical paranthese and "off by one" errors...

Bye, Olaf.
 

Neil

Yes it is an automated process. Download a file via FTP, check the size, if too big, chop.


Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Olaf

I'm actually giving you a star as well. In the end I ended up using your suggestion as it is faster . The problem with your suggestion is that the file I'm using is a unix file which does not contain a CHR(13) at the ned of each line, but only a CHR(10). And when I first tested your suggestion, I was getting no line count, but after removing the CHR(13) I did get the right line count. Thank you again.

Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 

Olaf,

Although using your suggestion works when I need to split a file in two, I having a hard time figuring it when I need to split it into n (undetermined number of times.
So far I can determine how many times I need to chop it down by with the following, but the SUBSTR() fonction is difficult to implement. Any suggestions?
Code:
PUBLIC lcEndofLine

lcEndofLine = Chr(10)
CD d:\downloads\mtrl_gk\repro\
lcFile = FILETOSTR('cdr.20050102.dat')
LOCAL howmany
howmany = 0
lnLines = OCCURS(lcEndofLine,lcFile)
howmany = lnLines/30000
howmany = CEILING(howmany)




Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Hi Mike,

glad it helps. Yes, Stringfunctions are quite fast. And setting lcEndOfLine to CHR(10) is what I meant when saying it depends on the file...

Well, if chopping every 30000 lines, then I'd do it that way:

Code:
lcEndofLine = chr(10)
...
FILETOSTR('cdr.20050102.dat')
...
lnLinesPerPart = 30000
n = 0
do while !EMPTY(lcFile)
   n = n+1
   if occurs(lcEndofLine,lcFile)>= lnLinesPerPArt
      lcFilePart = Left(lcFile,at(lcEndofLine,lcFile,lnLinesPerPart)+Len(lcEndofLine)-1)
      lcFile = Substr(lcFile,at(lcEndofLine,lcFile,Int(lnLines/2))+Len(lcEndofLine))
   else
      lcFilePart = lcFile
      lcFile = ""
   endif
   strtofile(lcFilePart,"part"+transform(n)+".dat")
   Sys(1104) && maybe free some memory
enddo

 
Of course the substr statement must also be changed:
Code:
...
lcFile = Substr(lcFile,at(lcEndofLine,lcFile,lnLinesPerPart)+Len(lcEndofLine))
...

It's not really my day. I'll stop in half an our and go drink some beer. Perhaps that'll help... :)

Bye, Olaf.
 
Thanks Olaf

I have created my own, which most likely work the same as yours. Here it is.
Code:
PUBLIC lcEndofLine
LOCAL nLines
nLines = 30000
SET STEP ON
lcEndofLine = CHR(10)
CD d:\downloads\mtrl_gk\REPRO\
lcFile = FILETOSTR('cdr.20050102.dat')
LOCAL howmany
howmany = 0
lnLines = OCCURS(lcEndofLine,lcFile)
howmany = lnLines/30000
howmany = CEILING(howmany)
nval = 0
FOR i = 1 TO howmany
	nLines2 = nLines*i
	IF i = 1
		nBeg = 1
	ELSE
		nBeg = nBeg+nval+2
	ENDIF
	IF i < howmany
		lcFilen = SUBSTR(lcFile,nBeg,AT(lcEndofLine,lcFile,nLines2)-(nBeg+1))
	ELSE
		lcFilen = SUBSTR(lcFile,nBeg)
	ENDIF
	nval = AT(lcEndofLine,lcFile,nLines2)-(nBeg+1)
	lcfilename = "c:\cdr.11111"+TRANSFORM(i)+".dat"
	STRTOFILE(lcFilen,(lcfilename))
ENDFOR




Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Don't have tested if that will seperate it correctly at line ends, but I'm sure you did so. Nice, as it really just extracts each part with SUBSTR() once, while my solution would copy most of the file (the rest after the current piece) several times.

Bye, Olaf.
 
STRTOFILE() and FILETOSTR() do have relatively low file size limitations and should not be relied upon in an automated process where the file sizes may be large.

Low level file functions will work for up to 2 gig files.
Past that, you need to use windows scripting (slower but no size limit). I just utilized it the other day on a 5 gig file.

See Bypassing Low Level File Functions 2GB Limits faq184-4732

Brian
 
Mike, I notice you aren't testing in case someone sends an empty file or one with no line break. Murphy's law, you know, expect the unexpected. Here's a snippet to try.

Code:
LOCAL howmany
howmany = 0
lnLines = OCCURS(lcEndofLine,lcFile)
IF lnLines = 0 AND LEN(lcFile) > 0
   lnLines = 1
ENDIF
howmany = CEILING(lnLines/30000)
* VFP6 note: CEILING(1,199999)=1, CEILING(1,200000)=0
dbMark (Why aren't I reading a book on my lunch break?)
 
dbMark

Excellent idea. If add that into the routine.


Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
M,

Quick and dirty...

Create a table with one field at about 120 to 240 length of type char and append from your text file TYPE SDF.

then you can do string searches for splits or just

copy to (filename1) for recno() <= reccount()/2
copy to (filename2) for recno() > reccount()/2

If your text file is not limited in line length with a CRLF and or any of the lines of data are over 240 length then this will not work.

I use this method alot and it is quick and dirty 8)

Fred
 
Fred

How does your technique work if I need to split the file,say, four time?

Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top