Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Converting 300 excel sheets into database 2

Status
Not open for further replies.

Paul77fox

Technical User
Feb 6, 2013
14
0
0
IT
Hello guys. Please help me with this and tell me the best method of creating a database of 300 clients everyone with its own table ((months (only ten of them) an services (30)). If I create a table where all the clients will have fields for months and services, will result 10x30=300 fileds for every client. And the maximum fileds per record in visual foxpro is 255.
Or I will create 10 tables, one for each month, everyone with 300 clients / 30 services. I cannot think anything else.
 
300 clients everyone with its own table

Let's start off by saying that the data architecture you are suggesting is NOT recommended.

Instead you should consider Normalizing your data.

So before we offer suggestions on methods to approach your Excel files, you need to decide on the data architecture of your final data tables (NOT a "Database").

Once you have a good data plan, then we will be glad to help you approach 'extracting' the data from your Excel files.

Good Luck,
JRB-Bldr

 
The sheets are a mess and is no need to extract anything. I will start filling the tables by scratch. I'm not used to planning databases but in my case I think it will be something like many to many realtionship. Like every client can have multiple services and every service can be done by multiple clients. And then I think of table month witch is connecting the clients table and the sevices table. Any other suggestions are welcome.
 
> ((months (only ten of them) an services (30)). If I create a table where all the clients will have fields for months and services, will result 10x30=300 fileds for every client.

Nope. That's designing array in tables, that's not how you design table structures. If you're not used to it, just take a read on the simple wikipedia article about normalisation, and you already know enough to prevent to worst...
So there are clients (1 table), there are services (another table, not fields inside the client table) and there is a time aspect, you need data for 10 months). What is a month? Is it a column of a table? really? It's data! only data! So you nly need a month field to store a month number, never ever make data structure, data is data and structure is structure.

If a client offers a certain service for a certain month that's store in one record with all this info in it: clientID, serviceID, month. That's the main table structure. You'll want one further ID field as primary key and perhaps a few meta data, eg userid of user entering or editing, insertdate, update date, etc. Now you have 300 records per client, because 10 months*30 services make 300 records, not 300 columns. Indeed, if a client does not offer 30 services, it's even less records. Normalisation is often misinterpreted in saving bytes. If you fill all 300 fields you will have only 300 fields of data, instead of 300x3=900 fields, for example the clientid is only stored once in this record. But that's not what makes it redundancy. It's totally ok to store any foreign key as many times as you need it. What is kept from redundancy this way is the client name and service name, for example, they are only store in one record with one id, which is used to refer to them in all other places of other tables.

That's already the most important truth about database normalisation. Taken this structure, you can easily have 12 months or make it a datetime field and store the info with even finer granularity of time resolution, you can make it two fields datetimestart and datetimeend to specify the period of servicing, for example. That's not possible by putting service and month into a field caption, is it? Also you don't need to keep fields empty, if a client doesn't offer the one or the other service in the one or the other month?

If this isn't making "click" in your brain now, you should consider doing some tutorials, getting a teacher, make contact with experts, not here in forums, but personally. This is something you may have a talent for understanding, once you're shown it, but it's a learning process, there are several things to consider and to get experience with and get a feeling about. This is very important, because a bad structure may be able to store your data, but will make development on it a PITA, and I'm not talking about mexican food here, although it's kind of a double meaning.

Bye, Olaf.
 
Once you have a good data architecture designed and the associated data tables created, you can write a VFP 'convert' program to 'walk its way through' the various Excel files (assuming that they have the same general layout - although your saying: The sheets are a mess makes this not likely) and 'harvest' the data so that it can then be put into the appropriate data table in an automated manner.

If you don't create a good data architecture "up front' you will have numerous self-imposed challenges as things progress and data maintenance could end up being a 'nightmare'.

Olaf has already given you some good suggestions to consider.
In addition to Olaf's suggestion to read about Normalization, you might want to do a Google Search for: normalization tutorial for beginners there are quite a few 'finds'

Good Luck,
JRB-Bldr


 
I didn't think that this could be so complicated. Maybe I don't know how to explane what I want. I know that I see this problem in a very limited way, due to the lack of experience but let's try another approach. Let's say that I will have a form with a textbox that says "Insert client's name". I insert a valid client name and then a beautiful grid or table populates with data. The table is always the same. It has 31 columns (one is the name of the month and another 30 are the services names) and only 10 rows (because I need only 10 months). The column "month" is of course, read only. All the other columns are fully editable and contain different values (float with 2 decimal or null)
I understood that using so many fields is wrong but I simply don't understand how to connect may table "clients" with table "services/month" in such way that for every client to have the same table but with different values in it. Guys, don't get me wrong, I truly understand if you don't have time to waste with a rookie like me. Thanks a lot anyway and have a nice day!
 
For designing your data architecture, let's try to get you to temporarily ignore what you want to appear on your User's GUI Form.

Instead think of the Types/Categories of Data each Customer may have.
1. Customer Demographics - Name, Address, Contact Info, etc.
2. Possible Service Months - Jan, Feb, Mar, etc.
3. Types of Service - <whatever> even these may have their own separate 'Categories'
4. and so on, etc.

Each of these Data Categories would typically go into their own data table which would have one or more 'linkage' fields added such as Cust_ID, Month_ID, Service_ID, etc.

These 'linkage' fields could be used to bring all of the desired data for any individual Customer back into a consolidated set of records for viewing/editing/etc.

This 'consolidation' of the data for any given Customer could be accomplished by either a SQL Query statement or by using Relationships between the various data tables - which you would use might depend on what you need to accomplish.

Now, that you have a Normalized Data Architecture and you understand how to bring the data together, you can then use that in your VFP User GUI Form to display/edit/manage the data as needed.

Good Luck,
JRB-Bldr

 
It's not, that we don't have the time, but a forum is not interactively enough. It would be best, if you could really meet someone having a database design experience and work together on a table design.

You're too focused on the layout aspect. The database isn't there to store data in it's representational form. That's also a reason, why Excel sheets are not database table, they just have the aspect of a tabular layout. Any cell can have any value, but even if your service column rows all have the same type, it's still a wrong design. I already told you the table design: clientid, serviceid, month. Instead of 30 service columns it will have 30 records for each client, so two clients will lead to 60 record etc. It's simply new records, its data, not structure.

And to display the data you store in that way you'll either fill in an excel sheet or a grid in the way you want, that's then doing denormalisation. But for storage you go the inverse way and put each cell value in one record combined with clientid, serviceid and month number. Cells with NULL are not stored in a record and when you later create the grid no record for a combination of month, client and service simply means a NULL in that cell. So it's kind of storing coordinates and values per record.

Additional to that table, of course you will need two further tables with lists of clients and services, each of them would have an ID column and a name, either clientname or servicename. The IDs in these tables is called primary keys, they must be unique per table, eg you create a numeric sequence and have IDs 1,2,3,4... for clients and services. and the clientid and serviceid fields of the table storing the amounts, are references to these IDs, they move there from the client and services tabes, so they are foreign there, that's why they are called foreign keys.

A short sample with less rows and columns would look like this:

Code:
         month service1  service2
client1  1     0.1       0.5
client1  2     0.2       0.6
client2  1     0.3       NULL
client2  2     0.4       0.8

clients:
id name
1  client1
2  client2

services:
id name
1  service1
2  service2

clientservices
clientid, serviceid, month, amount
1         1          1      0.1
1         1          2      0.2
1         2          1      0.5
1         2          2      0.6
2         1          1      0.3
2         1          2      0.4
2         2          2      0.8
This structure of data storage does never need to change, no matter how many clients there are, no matter how many services there are, no matter how many months you would observe. The NULL in one of the cells is not in the data, because there is no record for clientid 2, service id 2 and month 1. Or the other way around: No record is needed to denote a missing value, it's not there.

This data structure allows to easily do statistics on eg the average amount of all services all clients did or used by SELECT AVG(amount) FROM clientservices. If you had seperate fields to store amounts of service1 and service2, you'd need to compute SUM(service1amount+service2amount) and divide by 2*COUNT(*). And that's just one example on how this is easier to maintain. Additional to this structure you make up table and field rules, eg you don't allow a double record with same clientid, serviceid and month, each combination only is allowed once. It's easy to do with a candidate index.

Bye, Olaf.
 
Thanks guys very much. I finally understood how it works. I think that using excel for such a long time washes your brain and cannot think that other data structures can exist. But they exist and are so beautiful and elegant. Have a nice day!
 
If you come from the point of view of an Excel user, this database tablke structure looks awkward and complicated, yet it's just extensible and general. It's a lot more records, one for each amount cell, and it's even a lot oif redundancy in comparison, why some people, who get this idea as the first and most important one hammered into their brain then don't get how this prevents redundancy by making so many records out of one sheet. You have to understand what kind of redundancy is bad and to be prevented with database normalisation. In short I don't count any foreign key as data, as it's just a reference to an existing value and thereby preventing redundancy of the data by redundant use of that reference key. And that's the main point of it in regard to redundancy.

Bye, Olaf.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top