Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Create 50 files from one master file

Status
Not open for further replies.
Jul 7, 1999
101
US
I need to create 50 individual(US States&nbsp;&nbsp;as CA, TX, NY)<br>files from one Master File (with state codes) in the least number of passes.&nbsp;&nbsp;&nbsp;&nbsp;Thanks&nbsp;&nbsp;&nbsp;&nbsp;
 
Well how exactly is the master file created, what format is it in, etc? <p>Karl<br><a href=mailto:kb244@kb244.com>kb244@kb244.com</a><br><a href= </a><br>Experienced in : C++(both VC++ and Borland),VB1(dos) thru VB6, Delphi 3 pro, HTML, Visual InterDev 6(ASP(WebProgramming/Vbscript)<br>
 
Created from&nbsp;&nbsp;old Cobol&nbsp;&nbsp;Also&nbsp;&nbsp;each new file must have a different&nbsp;&nbsp;name.
 
ok but how is the master file setup? is it almost like a Sequential Text File? <p>Karl<br><a href=mailto:kb244@kb244.com>kb244@kb244.com</a><br><a href= </a><br>Experienced in : C++(both VC++ and Borland),VB1(dos) thru VB6, Delphi 3 pro, HTML, Visual InterDev 6(ASP(WebProgramming/Vbscript)<br>
 
Are there only 50 files, one for each state? Or, is there some other meaning of new files? <br><br>If there are only 50 files, unique names are easily created using information in the master file.<br><br>Knowing that the master file is sorted makes it easy to develop an algorithm that uses no more than two open files.&nbsp;&nbsp;&nbsp;A slight modification on the algorithm allows for an unsorted master file to be used, but it is still possible to complete the&nbsp;&nbsp;processing with the master file and no more than one other file opened at a time.&nbsp;&nbsp; <p>Wil Mead<br><a href=mailto:wmead@optonline.net>wmead@optonline.net</a><br><a href= > </a><br>
 
Hello Wil,&nbsp;&nbsp;&nbsp;Yes, only 50 new files And the New files can (if its easer) be named&nbsp;&nbsp;XXX01, XXX02, XXX03.&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Thanks
 
I was looking more at the state code in the master file for the file name variant.&nbsp;&nbsp;If it's 01-50 cool, I was thinking more like AK, AL, AR, AZ... <br>Using the controling piece of data from the master file keeps the information around in some form. (It can then be used in the undo process.)<br><br><br>As far as the file split, that's a simple solution.&nbsp;&nbsp;<br><br>All you need to do is remember which file you wrote to last.&nbsp;&nbsp;If it's not the one you want to write to now, close the opened file and open the one you want.&nbsp;&nbsp;Using the master file data to help identify the file eliminates the need for a cross reference search (file name to state code).&nbsp;&nbsp;<br><br>I recommend the old COBOL approach treating everything that is not used as a filler and copying it raw,&nbsp;&nbsp;but your data needs may dictate something different.&nbsp;&nbsp;<br><br> <p>Wil Mead<br><a href=mailto:wmead@optonline.net>wmead@optonline.net</a><br><a href= > </a><br>
 
can i see the Master File? type it out on here or email it to me at mindchild_of_tristram@yahoo.com
 
I would suggest a slightly different approach, not using files but arrays. Read the whole 'file' into memory as a single string (get). Create an array of records by using SPLIT w/ vbcr as the delimiter. Create another array for the entire db, by using split and the field seperator character. Process each &quot;record&quot; using this to append a record to the array of fields.

When the entire file is converted to an array of fields, you can use a variety of methods to &quot;Sort (and groom) the records into sub arrays. Just like an old heap sort - except there is no need to &quot;merge&quot; the heaps back into a single array. Your individual heaps would be the states.

At any or several points in the process, you may need to &quot;groom&quot; or validate the information to conform to your new db schema.

When done with the heaping. validating and grooming, just write the array(s) to your files / tables.

On an entirely different tack, WHY would you seperate the records into seperate groups? Surely it is easier to select the subsets (Individual STATES) with (simple) SQL crieria when accessing?

MichaelRed
m.red@att.net

There is never time to do it right but there is always time to do it over
 
My guess MichaelRed is that there might be a little too much data to fit into arrays ;-)

I have a similar application I run once a year, where I have to deal with 83 files' (counties instead of states) worth of data. My case gets a little more complicated in that I have to split out the counties, write each file twice (once fixed-format, once comma delimited), zip and encrypt them, and build a CD-ROM image for mastering and subsequent duplication.

Additional fields have been added for this year, so the result is this won't even (zipped) fit on one CD any longer.

And it sounds like (as in my case)Tucsonpapa gets his source data sorted already.
 
Hmmmmmmmmmmmmmm,

The technique has been sucessfully applied to files of several hundred MBytes. It MAY be memory limited, but even if it is, memory is quite cheap these days, do just go get some more. Even modest systems can accomodate 1.5 GBytes, so a well done routine should be able to get a 1 GIG file into memory. In extremis, the WholeArray + the subsets could be held within the same memory. My point here is simply that if the emphasis is on time of execution, one needs to be concious of the simplistic fact that memory operations are an order of magnitude faster than ony I/O. Even if - for whatever reason - you choose to do the op in &quot;chunks&quot;, better to do the chunk in MEMORY that fiddle with files (I/O).

I do NOT know the structure of the source, so detailed suggestions are inappropiate, but IF the source were a standard CSV format, you could create and ADO recordset DIRECTLY from the file. From there, a parameter query could be used to extract the individual subsets and write them to &quot;new&quot; tables. I think this would be slower than the memory only approach, but still quite manageable.

MichaelRed
m.red@att.net

There is never time to do it right but there is always time to do it over
 
O-kaayyyy.

Anyhow, here is a really stripped sample. To save space I have horribly abbreviated names and pulled out nearly all status feedback, counters, etc.

Here we assume your STATE field is in positions 5-6 of the record. Easy to change, of course:
Code:
Option Explicit

Private Sub cmStart_Click()
  Dim sPath As String
  Dim sRec As String
  Dim sCurrSt As String
  Dim iStFld As Integer
  Dim iIn As Integer
  Dim iOut As Integer
    
  cmStart.Enabled = False
  lbStat.Caption = &quot;Running&quot;
  sPath = App.Path & &quot;\&quot;
  iStFld = 5 'Char pos. of State
  iIn = FreeFile(0)
  Open sPath & &quot;all.txt&quot; For Input As #iIn
  Do Until EOF(iIn)
    Line Input #iIn, sRec
    If Mid(sRec, iStFld, 2) <> sCurrSt Then
      If sCurrSt <> &quot;&quot; Then
        Close #iOut
      End If
      sCurrSt = Mid(sRec, iStFld, 2)
      iOut = FreeFile(0)
      Open sPath & sCurrSt & &quot;.txt&quot; For Output As #iOut
    End If
    Print #iOut, sRec
  Loop
  Close #iOut, #iIn
  lbStat.Caption = &quot;Complete!&quot;
End Sub
Sorry about the line-wraps.
 
LOL!

Look at the original posting date. Tucsonpapa could be long dead by now for all we know.

Just a little Sunday insanity.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top