Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Excel retrieve data from LARGE text file

Status
Not open for further replies.

MrStohler

Technical User
Feb 1, 2003
38
US
I am attempting to write a macro that would retrieve data from text file that could be large (500+ pages of text). The text file is the output of an engineering software package. Currently we manually gather the data to perform calculations that the software package does not handle. What I would like to do is have some way of using Excel to retrieve the data with just a few inputs by a user.

Example of the data:
[tt]

MEMBER END FORCES STRUCTURE TYPE = PLANE
-----------------
ALL UNITS ARE -- KIP FEET

MEMBER LOAD JT AXIAL SHEAR-Y SHEAR-Z TORSION MOM-Y MOM-Z



1 1 1 54.05 -2.00 0.00 0.00 0.00 -61.73
3 -52.26 2.00 0.00 0.00 0.00 21.71
3 1 40.71 18.99 0.00 0.00 0.00 247.88
3 -39.36 -9.99 0.00 0.00 0.00 41.96

2 1 3 33.81 -5.48 0.00 0.00 0.00 -21.71
7 -33.07 5.48 0.00 0.00 0.00 -60.43
3 3 28.90 -0.16 0.00 0.00 0.00 -41.96
7 -28.35 6.91 0.00 0.00 0.00 -11.07

3 1 2 58.79 0.00 -2.00 0.00 0.00 0.00
6 -56.99 0.00 2.00 0.00 40.02 0.00
3 2 55.17 0.00 -3.51 0.00 0.00 0.00
6 -53.82 0.00 3.51 0.00 70.15 0.00

[/tt]

Is there some way that if a user provided the values for "MEMBER", "LOAD", and "JT" that a macro could gather the remaining data from the corresponding row in the text file.

Factors that complicate this search:
1 This table might begin well into the text file
2 The data will almost always span several pages, and the headings shown in the sample data will appear at the top of each page. the information sought will most likely be far into the table.

Any thoughts as to the best approach would be greatly appreciated.

Thanks
 
first thing. DO NOT process the file directly. load it into a database table (better, and can be into an access .mdb file) or into a Excel spreadsheet(carefull if your data can have more than 65000 rows), filling all key fields with the correct information (e.g. member, load and jt).

If using a MDB you can then use straight ADO code to issue a select on the table to get the required records.
If loaded into Excel, you should order the data after the loading, and then loop through the cells to get ALL required cells (hint, if the key is greater than the one you are searching for then it does not exist or has no more entries.


As for processing the file the following is some code used to load two different type of files, both similar to what you have in that they have several different headers, and lines of data.
Note that I am giving the full code related to data and format that does not equal yourown, but that should be enough for you to adapt.

The data processed by process_file() is on the following format, which is similar to yours.
Code:
company name                ORDER ENTRY EDIT REPORT                    Page    2
ACCOUNTS RECEIVABLE                                                   10/06/2004
User: user_name                                                      1:45 PM EDT
Accession    Patient                         Service      Client    Estimated Price
------------ ------------------------------  -----------  --------  ------------

abc123ab1    Last_name,First_name            09/29/2004   5                97.12
               DIAGNOSIS IS REQUIRED FOR ALL COMPONENTS OR THE D.O.S. - MC


abc123ab2    Last_name,First_name            05/04/2004   10               86.00
               DIAGNOSIS IS REQUIRED FOR ALL COMPONENTS OR THE D.O.S. - MC
               RESPONSIBLE PARTY REQUIRED -MC
               RESPONSIBLE PARTY ADDRESS REQUIRED -MC
               MEDICARE HIC NO. REQUIRED - MC
               MEDICARE HIC FORMAT INVALID -MC

abc123dd1    Last_name,First_name            05/21/2004   10               33.00
               DIAGNOSIS IS REQUIRED FOR ALL COMPONENTS OR THE D.O.S. - MC
               RESPONSIBLE PARTY REQUIRED -MC
               RESPONSIBLE PARTY ADDRESS REQUIRED -MC
               MEDICARE HIC NO. REQUIRED - MC
               MEDICARE HIC FORMAT INVALID -MC


Code:
Option Explicit
Const const1 = "------------ -------"
Const const2 = "Main company Patient"
Const const3 = "ACCOUNTS RECEIVABLE "
Const const4 = "company  Consolidate"
Const const5 = "Estimated pricing:  "
Const const6 = "Exclusion criteria: "
Const const7 = "Selection criteria: "
Const const8 = "Sort criteria:      "
Const const9 = "Summary only:       "
Const const10 = "--------------------"
Const const11 = "                    "
Const const12 = "User:"
Const const13 = "* * * * *   R e p o r t    S u m m a r y"
Const constspaces = "                                                                                "

Sub process_file_ardata()
    Dim wWork_sheet As Worksheet
    Dim wWork_Book As Workbook
    Dim nfreefile As Long
    Dim iFileHandle As Integer
    Dim sFile As String
    Dim sWorkvar As String
    Dim sValue1 As String
    Dim sValue2 As String
    Dim sValue3 As String
    Dim sValue4 As String
    Dim sValue5 As String
    Dim sValue6 As String
    Dim sValue7 As String
    Dim sValue8 As String
    Dim sValue9 As String
    Dim sValue10 As String
    Dim sValue11 As String
    Dim sValue12 As String
    Dim sDataline As String
    Dim iRow As Long
    Dim bProc As Boolean
    
    iRow = 2
    
    Set wWork_Book = Excel.Application.Workbooks.Add
    Set wWork_sheet = wWork_Book.Worksheets(1)
    wWork_sheet.Cells(1, 1).Value = "ACCOUNT"
    wWork_sheet.Cells(1, 2).Value = "NAME / RESPONSIBLE PARTY"
    wWork_sheet.Cells(1, 3).Value = "CHG TO"
    wWork_sheet.Cells(1, 4).Value = "INVOICE NO."
    wWork_sheet.Cells(1, 5).Value = "PHONE"
    wWork_sheet.Cells(1, 6).Value = "BILLED DATE"
    wWork_sheet.Cells(1, 7).Value = "BALANCE"
    wWork_sheet.Cells(1, 8).Value = "NEXT PROC"
    wWork_sheet.Cells(1, 9).Value = "SERVICE DATE"
    wWork_sheet.Cells(1, 10).Value = "LAST DATE"
    wWork_sheet.Cells(1, 11).Value = "AMOUNT"
    wWork_sheet.Cells(1, 12).Value = "AGE"


    sFile = Excel.Application.GetOpenFilename
    iFileHandle = FreeFile()
    Open sFile For Input As iFileHandle
        Do While Not EOF(iFileHandle)
            Line Input #iFileHandle, sDataline
            If Len(sDataline) > 1 Then
                sWorkvar = Mid(sDataline, 74, 1)
                Select Case sWorkvar
                Case "/"
                     bProc = True
                Case Else
                     bProc = False
                End Select
                If bProc Then ' we have a valid line. Now we need to split it and put it on the correct cells.
                ' There are two options. 1- the most common is to have a invoice number within the field limits
                ' 2- the other less common is when the invoice number is greater than the field and passes to the next field
                ' This can be determined by a "-" on position 56. If true them we have option 2.
                    If Mid(sDataline, 56, 1) = "-" Then
                        sValue1 = Trim(Mid(sDataline, 1, 12))
                        sValue2 = Trim(Mid(sDataline, 13, 24))
                        sValue4 = Trim(Mid(sDataline, 44, 14))
                        sValue5 = Trim(Mid(sDataline, 58, 14))
                    Else
                        sValue1 = Trim(Mid(sDataline, 1, 11))
                        sValue2 = Trim(Mid(sDataline, 12, 25))
                        sValue4 = Trim(Mid(sDataline, 44, 13))
                        sValue5 = Trim(Mid(sDataline, 57, 15))
                    End If
                    sValue3 = Trim(Mid(sDataline, 37, 7))
                    sValue6 = Trim(Mid(sDataline, 72, 10))
                    sValue7 = Trim(Mid(sDataline, 83, 10))
                    sValue8 = Trim(Mid(sDataline, 94, 4))
                    sValue9 = Trim(Mid(sDataline, 99, 10))
                    sValue10 = Trim(Mid(sDataline, 110, 10))
                    sValue11 = Trim(Mid(sDataline, 121, 8))
                    sValue12 = Trim(Mid(sDataline, 130, 3))
                    wWork_sheet.Cells(iRow, 1).Value = sValue1
                    wWork_sheet.Cells(iRow, 2).Value = sValue2
                    wWork_sheet.Cells(iRow, 3).Value = sValue3
                    wWork_sheet.Cells(iRow, 4).Value = sValue4
                    wWork_sheet.Cells(iRow, 5).NumberFormat = "@"
                    wWork_sheet.Cells(iRow, 5).Value = sValue5
                    wWork_sheet.Cells(iRow, 6).Value = sValue6
                    wWork_sheet.Cells(iRow, 7).NumberFormat = "#.00"
                    wWork_sheet.Cells(iRow, 7).Value = sValue7
                    wWork_sheet.Cells(iRow, 8).Value = sValue8
                    wWork_sheet.Cells(iRow, 9).Value = sValue9
                    wWork_sheet.Cells(iRow, 10).Value = sValue10
                    wWork_sheet.Cells(iRow, 11).NumberFormat = "#.00"
                    wWork_sheet.Cells(iRow, 11).Value = sValue11
                    wWork_sheet.Cells(iRow, 12).Value = sValue12
                    iRow = iRow + 1
                End If
            End If
        Loop
    Close iFileHandle

End Sub


Sub process_file()
    Dim wWork_sheet As Worksheet
    Dim wWork_Book As Workbook
    Dim nfreefile As Long
    Dim iFileHandle As Integer
    Dim sFile As String
    Dim sValue1 As String
    Dim sValue2 As String
    Dim sValue3 As String
    Dim sValue4 As String
    Dim sValue5 As String
    Dim sDataline As String
    Dim iRow As Long
    Dim iRow_error As Long
    Dim sWorkvar As String
    Dim bProc As Boolean
    Dim bNew As Boolean
    
    iRow = 2
    
    Set wWork_Book = Excel.Application.Workbooks.Add
    Set wWork_sheet = wWork_Book.Worksheets(1)
    wWork_sheet.Cells(1, 1).Value = "ACCESSION"
    wWork_sheet.Cells(1, 2).Value = "PATIENT"
    wWork_sheet.Cells(1, 3).Value = "DATE"
    wWork_sheet.Cells(1, 4).Value = "CLIENT"
    wWork_sheet.Cells(1, 5).Value = "ERROR"
    wWork_sheet.Cells(1, 6).Value = "ESTIMATED PRICING"

    sFile = Excel.Application.GetOpenFilename
    iFileHandle = FreeFile()
    Open sFile For Input As iFileHandle
        Do While Not EOF(iFileHandle)
            Line Input #iFileHandle, sDataline
            If Len(sDataline) > 1 Then
                bProc = True
                sWorkvar = Mid(sDataline, 1, 20)
                If Mid(sDataline, 15, 40) = const13 Then
                    Exit Do ' Don't process more lines as we are at the end of valid data
                End If
                Select Case sWorkvar
                Case const1, _
                     const2, _
                     const3, _
                     const4, _
                     const5, _
                     const6, _
                     const7, _
                     const8, _
                     const9, _
                     const10, _
                     const11
                     bProc = False
                Case Else
                    If Mid(sWorkvar, 1, 5) = const12 Then
                        bProc = False
                    End If
                End Select
                If bProc Then ' we have a valid line. Now we need to split it and put it on the correct cells.
                    If Not Mid(sDataline, 1, 1) = " " Then
                        sValue1 = Mid(sDataline, 1, 12)
                        sValue2 = Mid(sDataline, 14, 30)
                        sValue3 = Mid(sDataline, 46, 11)
                        sValue4 = Mid(sDataline, 59, 8)
                        sValue5 = Mid(sDataline, 69, 12)
                        
                        wWork_sheet.Cells(iRow, 1).Value = sValue1
                        wWork_sheet.Cells(iRow, 2).Value = sValue2
                        wWork_sheet.Cells(iRow, 3).Value = sValue3
                        wWork_sheet.Cells(iRow, 4).Value = sValue4
                        wWork_sheet.Cells(iRow, 6).Value = sValue5
                        bNew = True
                    Else
                        If bNew Then
                            iRow = iRow - 1
                            bNew = False
                        End If
                        wWork_sheet.Cells(iRow, 1).Value = sValue1
                        wWork_sheet.Cells(iRow, 2).Value = sValue2
                        wWork_sheet.Cells(iRow, 3).Value = sValue3
                        wWork_sheet.Cells(iRow, 4).Value = sValue4
                        wWork_sheet.Cells(iRow, 6).Value = sValue5
                        
                        wWork_sheet.Cells(iRow, 5).Value = Mid(sDataline, 16, Len(sDataline) - 15)
                    End If
                    iRow = iRow + 1
                End If
            End If
        Loop
    Close iFileHandle

End Sub


And finally I have given a full code instead of just pointers, as this type of parsing is not easy for people not used to it.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
I agree with Frederico, sort of.

Your data seems to be more normalized than the example Frederico posted, and I suspect that for engineering purposes your data is always a 'one off' so there is no need to store the data for future use so Access may not be the right solution, but the ADO piece is.

To leverage the power of ADO, while still maintaing output to Excel, I would recomend using MS Query to return the data from your text file to MS Excel.

This will allow you to keep your data in the (flat) text file, allowing you to provide advanced filters to the data that can easily be changed by the end user, with a minimum of programming by you the developer.

My two cents.

Engineer by training, programmer by career,
CMP

P.S. Frederico, fantastic piece of data mining code!

Instant programmer, just add coffee.
 
Frederico

Thanks for your prompt reply. I am greatful for your help. It looks like I need to study some VBA to better understand your reply.

Also it appears that your reply is based on searching an Excel file (although you do note an Access file file be better).

My Access skills are very limited (my problem, if I need to learn I will, please advise). But I think the use of a text file or Access file is required since the length of my output frequently exceeds 65,536 rows of text.

I did import a small sample problem into an Access table which results in a few follow-up questions?

1. Are there limits to an Access Table length (maximum numberr or rows like Excell)?.
2. When importing the text file in to Access Table, Access attempted to break the text into columns. This is not applicable, since the same column spacing of the text file does not caryy through the entire output (I only show a portion in my original post). Can Access be set to import each row of text into one "field"?
3. How would your code need to be modified to search the Acceess dfatabase to find out that "SHEAR-Z" is 3.51 (see original post) if the user knows that "MEMBER" = 3, "LOAD" =3 and "JT" =6?

Thanks again for your help.
 
No limits on rows on Access (not significant anyway. a few milions).

No. my code is how to LOAD a text file into a format more adapted to your needs. this can be Excel (my sample) or it can be any database using a ADO connection or command object.

Once the file is loaded it is easier to search for your particular key fields.

As you have shown in your data
MEMBER LOAD JT AXIAL SHEAR-Y 

1 1 1 54.05 -2.00
3 -52.26 2.00
3 1 40.71 18.99
3 -39.36 -9.99

You key fields are not in all rows. as such when loading into Excel or a database these fields need to be populated. (using ADO directly on this file is therefore not possible)


If using a database (access or other) you can just use ADO and issue a select as

select my_required_fields_list from my_table where
MEMBER = Xyx and LOAD = abc and JT = def


I hope your data has some tipe of structure similar to the one you have shown. if it doesn't then you may be in big trouble.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
forgot
CMP

P.S. Frederico, fantastic piece of data mining code!

This particular piece of code was done for another poster here. similar problem but needed to load into Excel with specific formating.

very enjoyable piece of work.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Frederico and CMP

Thanks for your input but, I think it will have limited ability for me since many of the text files I will need to reference exceed Excel's limit for rows (the last output I ran had over 290,000 rows of text).

The example data I showed in the original post begins somewhere in the body of the text file and can occupy about 1/3 to 2/2 of the rows. Too further complicate matters the data that appears before and after the data I am seeking now is still tabular (fixed width fonts) but its layout is different.

Currently we search the text file to find the "MEMBER" for which we seek data (not difficult but tedious), and then we manually enter this data into a spreadsheet that handles the calculations the software package does not.

What I want to do is automate this last step. If a user knows the "MEMBER", "LOAD", and "JT", can a macro be written to retrieve it from this HUGE file.
OR
Maybe this is easier, can a macro be written to import all of the data for one or two particular member(s) into Excel. Please not that the data might span a few pages and each page will have a header similar to that show in my original post.

Thanks for all or your input and patience.
 
load the file in to a Access db or better yet into MSDE (SQL Server)

After that getting the data into Excel is very easy.

either done with MS query, or you can create a small function where you enter your MEMBER, LOAD and JT values and that retrieves all related information into your Excel worksheet.

Whathever macro you write to loop through the file to get a particular member will run a lot slower than getting the data from a DB, and in terms of programming work involved you will not save much.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top