Annoying feature making a table from CSV file 1

tedsmith · Mar 20, 2015

I want to create a database table from a CSV file.
Using the following code works well EXCEPT that it interprets data in each column and sets the format of the table to what it thinks is the best (text, number or date).
This is normally OK except then if one of the columns that I really want to be TEXT (because it contains a mix of text and date info ) contains something that looks like a date it forces the whole column to be a date.
This might sound a good idea but any other row fields in this column that DON'T constitute a date are ignored as an error and are not copied to the table.
Is there a way or "turning off" this feature so all table columns are TEXT only?
I can't change the CSV file as it is made by others.
EG.
Name,Dob
Tom,1/4/1936
Dick,25/12/1950
Harry, Deceased (this last line never gets copied)

Code:

Sub ConvertCSVToTable()
Dim con As ADODB.Connection
Dim strCn As String
Dim strSQL As String
Set con = New ADODB.Connection
strSQL = "SELECT * INTO Names IN 'C:\Datacollectionserver\Settings.mdb' FROM " & DownLoadFolder & "\" _
    & Format(DateAdd("d", 8 - Weekday(Date, vbMonday), Date), "yyyymmdd") & "_INB.CSV"
strCn = "Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
    "DBQ=C:\;" & _
    "DefaultDir= C:\;" & _
    "Extended Properties='text;HDR=YES;FMT=Delimited'"
con.Open strCn
con.Execute strSQL
con.Close
Set con = Nothing
End Sub

I could do it by splitting and writing each line but this takes forever whereas the above method only take a second or so for the 70000 row file.

The Unusual CSV filename above is a new file delivered each Monday with a varying filename to suit.

SkipVought · Mar 20, 2015

Ted,

It's been a long time ago when I used an .ini (I think) to define a schematic for a text file in order to be able to us in a query. The schematic defines delimited/fixed, every column type. Wish I remembered more, but sipping my French Roast here on the couch on the porch isn't jogging my brain.

Andrzejek · Mar 20, 2015

Correct me if I am wrong, but are you saying you get the CSV files “made by others” and those files are different (could be different) every time you get them? So a file one time is:

[pre]
Name DOB SomeNumber
Bob 12/12/1937 123
Susie 1/05/2000 678[/pre]

And some other time the file is:
[pre]
DOB Country Name
12/12/1988 USA Bob[/pre]

And you make a separate table in your DB for every CSV file you receive?

Have fun.

---- Andy

A bus station is where a bus stops. A train station is where a train stops. On my desk, I have a work station.

SkipVought · Mar 20, 2015

Andy, he's saying that the structure does NOT change, but the DOB column is mixed with text data.

strongm · Mar 20, 2015

What you need Ted, and which Skip was struggling to remember, is a schema.ini file:

https://msdn.microsoft.com/en-us/library/ms709353(v=vs.85).aspx

dilettante · Mar 20, 2015

Specifying a [tt]MaxScanRows[/tt] value larger than the default (set in the registry, typically 25) might also work more often, but the schema.ini is the way to go.

No idea why anyone would use the old Text ODBC Driver though. That was deprecated some time back. Jet 4.0's Text IISAM is available and offers a lot more power with no additional effort. Probably faster too since there is no need to thunk through the ODBC Provider adapter layer.

dilettante · Mar 20, 2015

Ahh, I see the Jet ODBC thunk is being used here. Maybe even scarier.

tedsmith · Mar 21, 2015

Thanks for your prompt replies. I will try IISAM.

What is "SCARY" about it? It seems to work OK.

Incidentally, I want to create a NEW table each time rather than update an existing table so that if they change the CSV layout, it will still update a table without errors. I use a grid to view the data and modify it accordingly at run time setting the column names to the first line of the CSV file that contains the headers.

strongm · Mar 21, 2015

>setting the column names to the first line of the CSV file

schema.ini will handle that quite happily

tedsmith · Mar 21, 2015

Forgive me for being a bit slow but I am a bit confused regarding dilettant's "scary" comment.

Reading the various articles on ISAM and viewing examples, there does not seem to be any clear explanation of the difference between "ODBC" and "ISAM" or why one or the other should be used.

Some examples of what are claimed to be "using ISAM" seem almost the same as the example I posted.

I would much appreciate it if someone please enlightened me or gave me a link to what would explain it in plain English and what I should change to use "IISAM" instead and why?

dilettante · Mar 22, 2015

Scary" in terms of "looks like somebody copy/pasted some ancient sample code" not "scary" in any other sense. Sort of the way the hair raises when you see DAO used in 2015, or data under the system drive root.

There is a good reason why the old Desktop ODBC Drivers have so many similarities to Jet IISAM SQL syntax, use of schema.ini files, and so on. They are based on code once shared with early versions of Jet.

The risks associated with their use are minor but can smack you hard if you don't pay attention. For example they rely on a different set of defaults in the registry, might invoke Jet 3.5 instead of 4.0 when excreting into an MDB file, and do not get bug fixes and performance improvements. On a machine with Office installed you might have a different set installed than shipped with Windows. Any time you use ODBC with ADO you have to work through an adapter layer, slowing things more. You may clunk into SQL syntax differences such as wildcard pattern symbol and supported data type differences. Databases get opened and closed and opened again query by query. Lots of reasons not to use them.

If doing this works for you, fine. But I wouldn't use them in any production code.

But in general avoid those "here's a million connection strings" sites. Many of them have poor examples dating back to the 1990s.

"Air code" but probably close:

Code:

Sub ConvertCSVToTable(ByVal DataRoot As String)
    'DataRoot has trailing "\" character.  Preferably NOT a protected
    'location such as the system drive root!
    Dim CsvFile As String
    Dim File As Integer
    Dim Connection As ADODB.Connection

    CsvFile = Format$(DateAdd("d", 8 - Weekday(Date, vbMonday), Date), "yyyymmdd") _
            & "_INB.CSV"
    
    On Error Resume Next
    Kill DataRoot & "schema.ini"
    On Error GoTo 0
    File = FreeFile(0)
    Open "schema.ini" For Output As #File
    Print #File, "[" & CsvFile & "]"
    Print #File, "Format = CSVDelimited"
    Print #File, "TextDelimiter = """
    Print #File, "MaxScanRows = 0"
    Print #File, "ColNameHeader = True"
    Print #File, "CharacterSet = 1252"
    Print #File, "Col1 = ""Name"" Text"
    Print #File, "Col2 = ""DOB"" Text"
    Close #File
    
    Set Connection = New ADODB.Connection
    With Connection
        .Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" _
            & DataRoot & "Datacollectionserver\Settings.mdb'"
        .Execute "SELECT * INTO Names FROM " _
               & "[Text;Database=" & DataRoot & "].[" & CsvFile & "]", _
                 , _
                 adCmdText Or adExecuteNoRecords
        .Close
    End With
End Sub

If possible open the MDB for exclusive access and importing will be much quicker than otherwise since no locking is required.

tedsmith · Mar 22, 2015

Thanks.
The penny drops. It was the "provider" line that had me confused even though it appears to be so obvious (to some people).

One question that remains is can I make a schema.ini file that defines the format of all fields to text without having a line for each field stating its original field name and format?
All examples I have found seem to suggest you have to include the field name.

Otherwise I guess I could pre-read the just first line of the CSV file and make a schema.ini from that.

strongm · Mar 22, 2015

>All examples I have found seem to suggest you have to include the field name

Did you not read the link I sent? The 'Specifying the Fields' section is pretty clear on this:

You can specify field names in a character-delimited text file in two ways:

• Include the field names in the first row of the table and set ColNameHeader to True..
• Specify each column by number and designate the column name and data type

In terms of the format, yes you need to specify because otherwise a number of rows are scanned (25 by default as far as I remember, and controlled in schema.ini by the MaxScanRows setting) and a best guess made If you let it guess you may continue to run into the problem you are trying to avoid.

tedsmith · Mar 22, 2015

Yes I read it. < specify field NAMES in a character-delimited text file in two ways>

The first alternative above appears to mean use the Name only, the second means use Column Number AND Name
I interpret this as you still have to include then column name in both.

I don't want to specify the Field Name, just the format. I want to use the name from the CSV file only.

Regarding not using the root directory, I haven't found a way of sending the new CSV file to a Windows 7 computer that has an app and its data sub folders in Program Files. I usually try to have all files for an app in the one folder and it's sub folders and not scattered throughout the computer. It makes it much easier if it is a requirement to force updates by remote control rather than the usual way of the user doing it from the receiving end.

That why I have sometimes put apps that continuously need data updating from an external source in a folder in the root directory and not in Program Files even though I know it is frowned on for security purposes but luckily it wasn't the type of app that would be worth hacking into and is a case where the opposite of security is required.

Is there a way of programmatically removing the Program Files security temporarily to do the update then closing it back again so I could put everything in Program Files where it belongs?

Hope that doesn't 'scare' people too much?

dilettante · Mar 22, 2015

For data located on the boot drive you do one of two things.

For per-user data you create a subdirectory within the user's profile (i.e. under ssfLOCALAPPDATA). This is normally a per-user first-run action since an installer has no idea how many users there will be or what their profile names are.

For per-machine data you create a subdirectory within the mahine profile (under ssfCOMMONAPPDATA). This is normally an installer action.

In both cases the created subdirectory gets "owner" security inherited from its parent directory. Commonly for per-machine data such subdirectories your installer should also alter the security, and since it is running elevated this isn't much of a challenge. Typically some locations need to have full read/write/create/delete access for all authenticated users.

In both cases to reduce collisions with other applications you create a "company name" subdirectory and under that an "application name" subdirectory.

Nothing new here, these have been the rules of the road since at least Windows 95 Desktop Update when Shell Special Folders were introduced and supported by shell32.dll 4.71 in 1997.

For data drives all bets are off and you can do pretty mcuh as you choose.

tedsmith · Mar 23, 2015

Thanks.
Yes I was aware you could do that but -
Unfortunately the users/installers that are likely to use the sort of programs I supply wouldn't have a hope in hell of understanding or doing manually what you say. The problem usually turns up in a few years time when the original machine lease runs out and the program has to be reinstalled in a new machine (sometimes with a new OS)

I would need an install program that did all this automatically (which I haven't at the moment)

Also any user has to be able to use the data.

I hate getting a "help" phone call from halfway across the world when I am on holidays!

Putting everything it in one folder in the root directory and sharing this is the easy way even if it is bad. The data involved is of no use to hackers anyway. It would be nice to be able to bypass the Program Files security (like PDW does) to reinstall by remote.

strongm · Mar 23, 2015

>I interpret this as you still have to include then column name in both

Upon reflection I think you are right. It has been a long time since I used schema.ini. But you can pull those names from the CSV (or the Names table) to build the schema.ini

tedsmith · Mar 23, 2015

Regarding dilettante's kind suggestion I get an error on the execute line

Run time error '-2147467259 (8000-4005)':
Methiod 'Execute' of object'_connection' failed

This seems to be the notification for a number of error types,(including many that can be fixed by sending money to someone!)

I'm sure I have the path/filenames correct but I can seem to find any example on the web that looks anything like this.
are there extras particular references I need to add (it is not giving the usual error advice like a missing reference does).

I have made a scheme condtruct routine from the original table that uses all fieldnames from the header (of which there are 24)

dilettante · Mar 23, 2015

There were two problems, one major and the other cosmetic.

The major problem is that "Names" seems to be a reserved word in Jet SQL, even though I don't find it documented anywhere. The cosmetic issue is using a trailing "\" in the [tt]Database[/tt] value in the SQL statement.

So add brackets around the table name, always a safe practice anyway. Here is a standalone tested example that does that but keeps the "\" just to show it was not the culprit:

Code:

Option Explicit

Private Sub SetTheStage(ByVal DataRoot As String)
    'Create fresh CSV file to import, create fresh MDB to import into.
    Dim CsvFile As String
    Dim File As Integer
    Dim Connection As ADODB.Connection

    CsvFile = Format$(DateAdd("d", 8 - Weekday(Date, vbMonday), Date), "yyyymmdd") _
            & "_INB.CSV"
    On Error Resume Next
    Kill DataRoot & CsvFile
    On Error GoTo 0
    File = FreeFile(0)
    Open DataRoot & CsvFile For Output As #File
    Print #File, "Name, DOB"
    Print #File, "Fred Flintstone, 1940-01-05"
    Print #File, "Barney Rubble, 1940-04-30"
    Close #File
    
    On Error Resume Next
    GetAttr DataRoot & "Datacollectionserver"
    If Err.Number = 0 Then
        GetAttr DataRoot & "Datacollectionserver\Settings.mdb"
        If Err.Number = 0 Then
            Kill DataRoot & "Datacollectionserver\Settings.mdb"
        End If
    Else
        Err.Clear
        MkDir DataRoot & "Datacollectionserver"
    End If
    On Error GoTo 0
    
    'Best to late-bind ADOX due to binary compatibility breaks Microsoft made.
    With CreateObject("ADOX.Catalog")
        .Create "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" _
              & DataRoot & "Datacollectionserver\Settings.mdb'"
    End With
End Sub

Private Sub ConvertCSVToTable(ByVal DataRoot As String)
    'DataRoot has trailing "\" character.  Preferably NOT a protected
    'location such as the system drive root!
    Dim CsvFile As String
    Dim File As Integer
    Dim Connection As ADODB.Connection

    CsvFile = Format$(DateAdd("d", 8 - Weekday(Date, vbMonday), Date), "yyyymmdd") _
            & "_INB.CSV"
    
    On Error Resume Next
    Kill DataRoot & "schema.ini"
    On Error GoTo 0
    File = FreeFile(0)
    Open DataRoot & "schema.ini" For Output As #File
    Print #File, "[" & CsvFile & "]"
    Print #File, "Format = CSVDelimited"
    Print #File, "TextDelimiter = """
    Print #File, "MaxScanRows = 0"
    Print #File, "ColNameHeader = True"
    Print #File, "CharacterSet = 1252"
    Print #File, "Col1 = ""Name"" Text"
    Print #File, "Col2 = ""DOB"" Text"
    Close #File
    
    Set Connection = New ADODB.Connection
    With Connection
        .Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" _
            & DataRoot & "Datacollectionserver\Settings.mdb'"
        .Execute "SELECT * INTO [Names] FROM " _
               & "[Text;Database=" & DataRoot & "].[" & CsvFile & "]", _
                 , _
                 adCmdText Or adExecuteNoRecords
        .Close
    End With
End Sub

Private Sub Main()
    SetTheStage App.Path & "\"
    ConvertCSVToTable App.Path & "\"
End Sub

dilettante · Mar 23, 2015

It seems like you always have a reason to do things incorrectly.

You should not be trying to deploy any VB6 programs without creating and testing a proper installer first, preferably one using Windows Installer for safety and to be able to do things like set folder security during installation.

If you really want a "user visible" dumping ground for data there are obvious candidates, far safer than opening up security on the system drive root, which just opens the door to malware.

One of the more obvious candidates is the Public folder, but I assume you'd quickly counter with some BS like "sorry, my users are still running the unsupported and unsafe Windows XP..."

Then you have alternatives that work even on crusty old OSs such as ssfCOMMONDESKTOPDIR and ssfCOMMONDOCUMENTS (which is not predefined in any typelib but equal to CSIDL_COMMON_DOCUMENTS = &H2E&). Of course once again these default to "owner" access so your installer should create a subdirectory there and set the desired access on it.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Annoying feature making a table from CSV file 1

Programmer

Programmer

Programmer

Programmer

MIS

MIS

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor