Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to write a preliminary dbf-table structure with many fields before import csv 10

Status
Not open for further replies.

german12

Programmer
Nov 12, 2001
563
DE
I have a big *.csv File.
The first two lines show me a lot of Field-Names and that looks like this:

"Kontonummer""Bankleitzahl""IBAN""BIC""Betrag""Buchungstext""Betrag - Währung""Buchungstag""Begünstigter/Absender - Bankleitzahl""Begünstigter/Absender - Kontonummer""Begünstigter/Absender - Name""Internet""Kategorie""Kommentar""Kostenstelle""Laufende Nummer""Marker""Originalbetrag""Originalbetrag - Währung""Primanota""Saldo""Saldo - Währung""Storno""Storno - Originalbetrag""Splittbuchung - Auftraggeber / Name""Splittbuchung - Kategorie""Splittbuchung - Kostenstelle""Splittbuchung - Originalbetrag""Splittbuchung - Unterkategorie""Splittbuchung - Verwendungszweckzeile 1""Textschlüssel""Unterkategorie""Verwendungszweckzeile 1""Verwendungszweckzeile 2""Verwendungszweckzeile 3""Verwendungszweckzeile 4""Verwendungszweckzeile 5""Verwendungszweckzeile 6""Verwendungszweckzeile 7""Verwendungszweckzeile 8""Verwendungszweckzeile 9""Verwendungszweckzeile 10""Verwendungszweckzeile 11""Verwendungszweckzeile 12""Verwendungszweckzeile 13""Verwendungszweckzeile 14""Wertstellungstag""Steuersatz""SteuersatzWaehr""Steuerbetrag""SteuerbetragWaehr""Fibu-Nr.""Splittbuchung - Steuersatz""Splittbuchung - SteuersatzWaehr""Splittbuchung - Steuerbetrag""Splittbuchung - SteuerbetragWaehr""Splittbuchung - Fibu-Nr.""Abweichender Auftraggeber""Ende zu Ende Referenz""Kundenreferenz""Mandatsreferenz""Purpose Code""Rückgabegrund""Rückgabegrund (Code)""Entgelt (fremd)""Entgelt (fremd) Währung""Entgelt (eigen)""Entgelt (eigen) Währung""Zinskompensationsbetrag""Zinskompensationsbetrag Währung""Ursprungsbetrag""Ursprungsbetrag Währung""Gläubiger-Identifikation""Soll""Haben"

Of course it would be possible to create a *.dbf-File, and then import the *.csv-file - however that means to define each field with its character and field-length.

My question:

When I am too lazy to write all the above fields into the *.dbf-structure manually - would there be a possibility to do this by code (create table....) and let the code create first a preliminary table-structure?
The field-length and eventually their character could perhaps changed later - but as I know that the majority are character-fields and let the field-length be as long as their header - I can imagine that
this is easier to correct later instead of writing the whole table-structure manually.

Any idea for this sample?

Thanks
Klaus




Peace worldwide - it starts here...
 
Klaus,
I'm not a fan of the Wizards in VFP but I used the import wizard to create a dbf structure by importing a .csv file which I changed to .txt and then tweaked it afterwards.
I'm not sure that this is what you're trying to achieve.

Thank you

Steve Williams
VFP9, SP2, Windows 10
 
The headers contain things like spaces and -, so they are not usable as field names.
How about creating generic field names?

All you need to determine is the number of fields by counting the " and dividing that by 2.

Well, I wonder why there are no commas in a CSV, is it really csv or something else?
Klaus said:
let the field-length be as long as their header

Well, data in further lines can be wider than a header, that's not a good measure. If you assume this to be the case, I wonder how it could be done to have IBANs all shortened to 4 characters, they can be as long as 34 characters. Mostly digits, but also letters - the country prefix.

So I'd count and let every field be char(254) and then mend that later in size and type with conversions. Or perhaps don't even count and create a universal maximum 255 fields dbf for import. And yes, of course you can let VFP create a table with autogenerated fields, you have macro substitution:
Code:
Local lnCol, lcColumns

lcColumns="Col1 C(254)"
For lnCol=2 to 255
   lcColumns = lcColumns+", Col"+Transform(lnCol)+" C(254)"
EndFor 

Create Table Import (&lcColumns)

Now its up to you to keep the header line out of the import or simply import it into that table. You can count the fields or let code look for the first all empty column after the csv import.

You could also go a longer route and create field names matching the headers, by stripping off anything except letters, for example, and then squeeze in " C(254)," between all fieldnames and " C(254)" at the end.

Chriss
 
Klaus,

The CSVProcessor can read a CSV file and create a cursor for you on the fly.

It will try to interpret the CSV data and set an adequate type for each field it encounters.

The location of the repository is at
 
Klaus,

Here's some code I use to convert csv to dbf for your perusal:

Code:
numRows = ALINES(aRow,FILETOSTR('MyFile'),4)  && 4:Elim blank rows
numFields = GETWORDCOUNT(aRow(1),'""')

* Create cursor/table MyTable here with your own field names (per Chris's suggestion)

FOR ii = 2 TO m.numRows  && Skip field names (some illegal)
 cLine = aRow(m.ii)
 Field1 = GETWORDNUM(m.cLine,1,'""')
 Field2 = GETWORDNUM(m.cLine,2,'""')
 . . .
 FieldN = GETWORDNUM(m.cLine,N,'""') && Where N=numFields
 INSERT INTO MyTable FROM MEMVAR
ENDFOR

Steve
 
Why, after you have the table you just need APPEND FROM file TYPE CSV or something similar with the delimited clause.

Chriss
 
Hi Chris,

Explanation: Because I just don't trust the result because of the non-standard things I often find in the csv (or Excel) files I receive from my clients. In fact I always look at the file itself before deciding exactly how I'll proceed.

Sometimes I have to use a hex editor to see what some of these weird characters are so I can remove them or account for them in code. Even something a small as an apostrophe in the data can mess up things. My data going to the IRS must meet exactly the format required by IRS publications.

That said, the APPEND FROM approach you suggest is easier and often satisfactory.

Steve
 
Klaus,

Can you share sample data? With the rows and a couple of data lines?

Using the CSVProcessor:

Code:
LOCAL csvp AS CSVProcessor

m.csvp = CREATEOBJECT("CSVProcessor")

m.csvp.Import(GETFILE())

produces something like

Captura_de_ecr%C3%A3_2022-05-18_155811_zoemur.png
 
I see, so you can catch if there are lines with more fields, when they have a comma within the value, for example.

But then a table with 255 fields would catch that, too.

And there are no characters that won't get into the data, no matter what codepage you use. So strange characters are a subject of which codepage to use. If a CSV file would contain any control characters aside from tabs I'd call it a bad file, unless there is documentation stating what this is about, for example.

Just to be clear on the APPEND: It's not the end of process, as we already know the types of the columns must be adjusted and the column names. The bad case for 255 char(254) fields is that recsize is quote large, so the csv file should not have too many lines, even when most fields would stay empty.

This specific case makes counting fields possible, Klaus says all these are column headers. So simply count the " or let them be counted with occurs.

Chriss
 
Chris said:
If a CSV file would contain any control characters aside from tabs I'd call it a bad file

If the file I receive can be loaded into Excel and looks decent, I will save it as a tab-delimited text file. That usually makes it easier to save that data into my custom structured dbf, as you infer for the tab delimiter. That procedure is different for each of my clients (they seem to follow no standard). From there I create the text file in the IRS format required by using code I developed previously in my app.

Steve
 
If Excel loads it as CSV, literally, the only thing hindering it to load in VFP is multiline values or a too large size.

Chriss
 
Well, and the other reason a file has low value bytes is it's some Unicode variation.

Again, no byte gets lost when you APPEND a file, you can end up with too many records some of which have only one field filled in some rows - hinting on multiline values, or you have too many fields filled as a comma within quotes is interpreted as field separator. VFP does these two things wrong. But using GETWORDNUM or ALINEs specifying comma as separator, you do the same error manually.

Have you tried atlopes CSVProcessor? I remember he posted about it in an earlier thread already and I think it takes care of such things.

The header line Klaus posted surely is unusual, as it only has quote delimiters but no commas. I won't assume it continues that way in all lines of the file, as not all fields are strings and so records will need some separator on top of delimiters. Otherwise that's new to me, a file generated with quotes around each value and thus quote to quote without a comma or tab or semicolon in between. It could simply be the forum messing with what Klaus posted, that had better been put in PRE tags or code tags.

Chriss
 
First of all I would like to thank you for the many replies.
I looked again at the possibilities of transferring my program to a csv file - and there I found several possibilities.
I have now filtered out this result as an example. (see attachment)
(Atlopes had asked about it.)
It is an extract with approx. 67 lines (the original has over 3000 lines) with a header and subsequent data sets.
This new transmission now also has a semicolon. (Thanks for pointing out the missing comma on the first transfer, Chriss).

With this sample it should make it easier to get the right answer to my question.
The name of the sample-file = configur.csv

Klaus

Peace worldwide - it starts here...
 
 https://files.engineering.com/getfile.aspx?folder=79126b56-7799-4818-9fed-aa2ac7346ca4&file=configur.csv
Thanks, Klaus,

now that's clearer I first would like to point out you should have a decent text editor in your toolset. Notepad shows the headers are in one line, as I would have expected it:
header_ar3exx.png


Notice the line number in the left margin.

When I open it in Notepad, even if automatic line breaks are turned off, this one line is displayed in 2 lines. Don't use notepad.exe

Copying the header to clip text you can easily find out this has 74 semicolons and so 75 fields:
Code:
? occurs(';',_cliptext)

So then do a 75 field import table or cursor:
Code:
Local lnCol, lcColumns

lcColumns="Col1 C(254)"
For lnCol=2 to 75
   lcColumns = lcColumns+", Col"+Transform(lnCol)+" C(254)"
EndFor 

Create Cursor Import (&lcColumns) 

Append From configur.csv Type Delimited With Character ';'
Note: The semicolon is the separator of field values. The term delimited as file type is badly chosen. CSV - comma separated file - is the better term. Type Delimited is for any file type where fields are separated with whatever character, tab, semicolon, comma. You can change from the default that uses comma as separator and quotes as delimiter of string fields. Take a close look at the help. For example it's a corner case that I needed to put the semicolon into quotes, an example from the help is [tt]APPEND FROM mytxt.txt DELIMITED WITH _ WITH CHARACTER *[/tt] where the separator character is not in quotes. But as a semicolon is the character meaning continuation of code line in VFP, you have to set that into quotes. It's all noted in the help.

Now record 1 can be deleted and you may use it as field name candidates and alter the table, or do that in advance. Anyway, you just need an APPEND in your case. Nothing fancy.

Chriss
 
Klaus,

This is the setup of the CSVProcessor to handle your CSV file:

Code:
LOCAL csvp AS CSVProcessor
LOCAL Result AS Integer

m.csvp = CREATEOBJECT("CSVProcessor")

m.csvp.ValueSeparator = ";"
m.csvp.DecimalPoint = ","
m.csvp.DatePattern = "%D.%M.%Y"

m.Result = m.csvp.Import(GETFILE())

IF m.Result == 0
	BROWSE
ELSE
	ERROR (m.Result)
ENDIF

This produces the following cursor (only the first columns are shown):

Captura_de_ecr%C3%A3_2022-05-19_002322_gcdd98.png


The structure of the cursor that was built in the process could be defined by the statement:

Code:
CREATE CURSOR Configur (;
KONTONUMMER I NULL, ;
BANKLEITZAHL I NULL, ;
IBAN V(22) NULL, ;
BIC V(11) NULL, ;
BETRAG B NULL, ;
BUCHUNGSTEXT V(20) NULL, ;
BETRAG___WÄHRUNG V(3) NULL, ;
BUCHUNGSTAG D NULL, ;
BEGÜNSTIGTER_ABSENEDR___BANKLEITZAHL V(11) NULL, ;
BEGÜNSTIGTER_ABSENEDR___KONTONUMMER V(22) NULL, ;
BEGÜNSTIGTER_ABSENEDR___NAME V(70) NULL, ;
INTERNET V(10) NULL, ;
KATEGORIE V(14) NULL, ;
KOMMENTAR V(10) NULL, ;
KOSTENSTELLE V(10) NULL, ;
LAUFENED_NUMMER I NULL, ;
MARKER I NULL, ;
ORIGINALBETRAG B NULL, ;
ORIGINALBETRAG___WÄHRUNG V(3) NULL, ;
PRIMANOTA I NULL, ;
SALDO B NULL, ;
SALDO___WÄHRUNG V(3) NULL, ;
STORNO I NULL, ;
STORNO___ORIGINALBETRAG B NULL, ;
SPLITTBUCHUNG___AUFTRAGGEBER___NAME V(10) NULL, ;
SPLITTBUCHUNG___KATEGORIE V(10) NULL, ;
SPLITTBUCHUNG___KOSTENSTELLE V(10) NULL, ;
SPLITTBUCHUNG___ORIGINALBETRAG V(10) NULL, ;
SPLITTBUCHUNG___UNTERKATEGORIE V(10) NULL, ;
SPLITTBUCHUNG___VERWENDUNGSZWECKZEILE_1 V(10) NULL, ;
TEXTSCHLÜSSEL I NULL, ;
UNTERKATEGORIE V(32) NULL, ;
VERWENDUNGSZWECKZEILE_1 V(142) NULL, ;
VERWENDUNGSZWECKZEILE_2 V(140) NULL, ;
VERWENDUNGSZWECKZEILE_3 V(14) NULL, ;
VERWENDUNGSZWECKZEILE_4 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_5 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_6 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_7 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_8 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_9 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_10 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_11 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_12 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_13 V(10) NULL, ;
VERWENDUNGSZWECKZEILE_14 V(10) NULL, ;
WERTSTELLUNGSTAG D NULL, ;
STEUERSATZ V(10) NULL, ;
STEUERSATZWAEHR V(10) NULL, ;
STEUERBETRAG V(10) NULL, ;
STEUERBETRAGWAEHR V(10) NULL, ;
FIBU_NR_ V(10) NULL, ;
SPLITTBUCHUNG___STEUERSATZ V(10) NULL, ;
SPLITTBUCHUNG___STEUERSATZWAEHR V(10) NULL, ;
SPLITTBUCHUNG___STEUERBETRAG V(10) NULL, ;
SPLITTBUCHUNG___STEUERBETRAGWAEHR V(10) NULL, ;
SPLITTBUCHUNG___FIBU_NR_ V(10) NULL, ;
ABWEICHENEDR_AUFTRAGGEBER V(68) NULL, ;
ENED_ZU_ENED_REFERENZ V(35) NULL, ;
KUNEDNREFERENZ V(10) NULL, ;
MANDATSREFERENZ V(22) NULL, ;
PURPOSE_COED V(4) NULL, ;
RÜCKGABEGRUND V(10) NULL, ;
RÜCKGABEGRUND__COED_ V(10) NULL, ;
ENTGELT__FREMD_ V(10) NULL, ;
ENTGELT__FREMD__WÄHRUNG V(10) NULL, ;
ENTGELT__EIGEN_ V(10) NULL, ;
ENTGELT__EIGEN__WÄHRUNG V(10) NULL, ;
ZINSKOMPENSATIONSBETRAG V(10) NULL, ;
ZINSKOMPENSATIONSBETRAG_WÄHRUNG V(10) NULL, ;
URSPRUNGSBETRAG V(10) NULL, ;
URSPRUNGSBETRAG_WÄHRUNG V(10) NULL, ;
GLÄUBIGER_IEDNTIFIKATION V(26) NULL, ;
SOLL B NULL, ;
HABEN B NULL)
 
Atlopes:

I tested your statement, which was created with the csvprocessor program, and it worked very well - because the statement can be used as a separate *.prg if a csv file is available.

Unfortunately I was/am still too stupid to address or run the csv-processor as an independent class.
This is because I have no experience setting up the "csv-processor" class or any other non-VFP-shipped classes.

In the link
I found not only the program but also many other modules.

Maybe there are instructions somewhere on how to do that
Program can be used over and over again as a standalone class/library in the environment of VFP.

It's a very powerful program, that's clear to me - and it will be used again and again, because a lot of the data is only available as *.csv.

If it deviates too far from my above question here, I would also ask that again as an extra thread.
Please let me know then.

There may be other members who are as inexperienced as I am.


Thanks very much
and greetings from Germany

Klaus

Peace worldwide - it starts here...
 
Klaus,

To use the CSVProcessor, download its repository (there is a green Code button on the first page, "Download ZIP" is one of its options). You must also download the repository at in the same manner.

Once you have extracted the zips into appropriate folders of your choice on your computer, you can [tt]DO csv.prg[/tt] (or [tt]DO LOCFILE("csv.prg")[/tt]). Then, the CSVProcessor class will be in scope, and you will be able to execute the snippet I posted above.
 
Atlopes

Thank you for your quick reply.
Yes - your program is really good.
In less than 10 seconds (including the choice of the CSV file to be used, the program delivered a VFPCursor file with over 50 fields and 3,000 lines - and also the annoying creation of a suitable file structure in the dbf file was not necessary.

All field names were correctly taken from the CSV file and also the values ​​there
recognizes the program correctly, so that evaluations can also be carried out immediately with the cursor.
The column width in the cursor is also optimal.

I then also tested how the program creates a CSV file with comma separation - no problem, I just have to
change one command in setup a bit. Through the setup you can also easily access
react to different national date formats.
Even if no headers are visible in the CSV file, the CSV processor takes the first line of data as the header (good solution).

Great!
The program saves a lot of work because it works correctly on many formats within a CSV file with few adjustments.

Thanks again for all the other attempts at a solution here, which could improve my knowledge considerably.

That's what makes this forum really fun!

Klaus




Peace worldwide - it starts here...
 
Klaus,

Great that you made it work.

Just a note regarding this:
Klaus said:
Even if no headers are visible in the CSV file, the CSV processor takes the first line of data as the header (good solution).

If you set
Code:
m.csvp.HeaderRow = .F.
before importing into a non-existing cursor, the fields will be named Col_1, Col_2, etc. The class will treat the first row as regular data.

You can find the documentation on the properties' class at
 
Atlopes
As I had already written, the CSV processor works well.
Still, a question related to varchar has now arisen for me.
As you can see in your statement, this instruction takes effect when transferring csv to dbf:
photo 1

definition_2022-05-29_184328_wknfyj.png


Now I've noticed that the bank from which this data comes from sometimes spreads information over several colums although the one combined string would be useful.
photo 2
Nullzeilen_2022-05-29_180210_aytl9s.png

It would be easily possible to summarize this information in a new wide column with alltrim(...)+alltrim(...)....but when I tried this I realized that this only works if the information " .NULL" is not present in any column. In this case the alltrim...statement would not return any result.

But maybe there is another instruction that could achieve that?
In this example - when I want to concentanete the third record (from field verwendungszeile3 to verwendungszeile7 - it will work with alltrim-addtion of strings, but not in line 4 as there are NULL definitions between (resulted in nothing)

I believe to understand the NULL in the program (nobody can know in advance which data-type is in the resource).
And of course it would be not good to change that partly, as the programm has to stay general.

I also tried in setup both commands
m.csvp.EmptyIsNull = .f.
and
m.csvp.EmptyIsNull = .t.
but that did not help


What could be done in that case?

Regards
Klaus


Peace worldwide - it starts here...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top