Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

database design suggestion 4

Status
Not open for further replies.

w11z

Programmer
Mar 26, 2001
40
CA
Hi everyone, What would be the best way of representing this relation between Countries, States and Cities.

1-

tblCountries
countID(PK), countName

tblCountries_States
countID, stateID

tblStates
stateID(PK), stateName

tblStates_Cities
stateID, cityID

tblCities
cityID(PK), cityName

OR is this way better

2-

tblCountries
countID(PK), countName

tblSates
stateID(PK), stateName, countID(FK)

tblCities
cityID(PK), cityName, stateID(FK), countID(FK)

Why would one design be preferred to the other?

Thanks
 
2 is better

1 isn't wrong, but it is over-designed, as it would allow a state to belong to more than one country, or a city to belong to more than one state

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
A city name can belong to more than one state. For instance, there is a Columbus in OH and WI (probably more). If Columbus, OH and Columbus, WI are supposed to be unique, 2 would do that. If you're only concern is that the city name be unique and not the city/state combination, you can go with 1.
 
If you're only concern is that the city name be unique and not the city/state combination, you can go with 1.
but surely 1. does not assign the same cityID to cities with same name? nothing in that design would suggest this is what is intended, so once again i say that 1. is overdesigned



;-)



r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
I'm not real skilled at this, so when I need to make these sort of layout decisions I go straight to the query end of it.

If somebody asked for all the cities in state a (id 23) in country b (id 76), what does the query look like in layout 1 vs layout 2?

In 2 I think it is

select cityid, cityname
from tblcities
where stateid=23 and countid=76

I would have a difficult time writing the same query in 1.

I think 2 is the easier query for this particular question. It may depend on what type of questions you think you will need to answer.
 
It may depend on what type of questions you think you will need to answer.
do you mean questions like "which states belong to more than one country?" and "which cities belong to more than one state?"


if not, then i would completely discard option 1.

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
Even 2 is overblown.

You have hierarchical data, because any one city can only be in one state, which can only be in one country.

Therefore the city table only needs a foreign key to state, thereby automatically making the country information available. This elegantly and correctly handles the two towns called Columbus, the two towns called Washington, one of which is in England, and the other in that strange land 3,500 miles off England's West coast, er, I think it's called America.

I would also recommend losing the "tbl" prefix from the table names, and the needless abbreviations. Everything in a database from which one selects data, can be reasonably assumed to be a table, unless specified otherwise, e.g. V_COUNTRY would be a view about countries.

You also need to constrain your data to be valid, e.g. no null country, state or town names. No leading or trailing white spaces, and initial capitals for all names (because a country is a proper noun).

As an example, try something like

Code:
CREATE TABLE COUNTRY
(
COUNTRY_ID   INTEGER,
COUNTRY_NAME VARCHAR2(100)
);

ALTER TABLE COUNTRY ADD CONSTRAINT PK_COUNTRY                PRIMARY KEY (COUNTRY_ID);
ALTER TABLE COUNTRY ADD CONSTRAINT NN_COUNTRY_NAME           CHECK (COUNTRY_NAME IS NOT NULL);
ALTER TABLE COUNTRY ADD CONSTRAINT UQ_COUNTRY_NAME           UNIQUE(COUNTRY_NAME);
ALTER TABLE COUNTRY ADD CONSTRAINT PROPER_NOUN_COUNTRY_NAMES CHECK (COUNTRY_NAME = INITCAP(COUNTRY_NAME));

CREATE TABLE STATE
(
STATE_ID   INTEGER,
STATE_NAME VARCHAR2(100),
COUNTRY_ID INTEGER
);

ALTER TABLE STATE ADD CONSTRAINT PK_STATE                PRIMARY KEY (STATE_ID);
ALTER TABLE STATE ADD CONSTRAINT NN_STATE_NAME           CHECK (STATE_NAME IS NOT NULL);
ALTER TABLE STATE ADD CONSTRAINT PROPER_NOUN_STATE_NAMES CHECK (STATE_NAME = INITCAP(STATE_NAME));
ALTER TABLE STATE ADD CONSTRAINT STATE_MUST_BE_IN_A_COUNTRY CHECK (COUNTRY_ID IS NOT NULL);
ALTER TABLE STATE ADD CONSTRAINT FK_STATE_COUNTRY FOREIGN KEY (COUNTRY_ID)
REFERENCES COUNTRY(COUNTRY_ID);

Resist the temptation to use the field name of "NAME" instead of COUNTRY_NAME or STATE_NAME as the word NAME on its own is a key or reserved word in Oracle (where I did the above example). For this reason I suggest you use a colour context sensitive editor to do your create table statements, so that you automatically are made aware of any such words.

I suspect that you are using Access, in which case this may well be irrelevant, as I believe it provides a designer form for the purpose.

Regards

T
 
Thank you very much. You all have contributed to make me (and hopefully others) more knowledgeable. I'm developing a complex DB using MySQL Workbench (for the first time). I have much to learn and you have helped me quite a lot. If their are others that would like to add anything, you are very welcome.
 
@Thargy

I'm developing with PHP and I'll be validating what the user will be entering either client side or server side. I won't let MySQL do the validation.
 
I won't let MySQL do the validation.
you should do it in both places, but if you wish to get away with doing it in only one place, then that place should be in the database


after all, your app is not the only way of getting data into the database, and you want to be sure the other routes don't introduce inconsistencies

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
The illustrious r937 has the right of it,

if you don't constrain the data to be valid in the database, I hereby guarantee you that you will get corrupt and/or invalid data into your system.

On what basis of reason and logic can you guarantee that the application will the the only thing ever to access the data? Nobody will ever start a query editor, link to the db and run SQL? No technicians will ever go in directly to perform routine maintenance? No changes will ever take place requiring new fields, and data synthesis?

You MUST constrain the data. If the application does it too, that's fine.

Regards

T
 
r937, Thargy, Thank you for the advice but how come almost all the tutorials I've read on the web (PHP and MySQL) have never mentioned this. I do think it's a wise decision even if the database validation is very limited
 
how come almost all the tutorials I've read on the web (PHP and MySQL) have never mentioned this
i'm betting the mysql tutorials do mention it, but the php tutorials don't -- i mean, when application code is what you do, then all problems have an application code solution


:)

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
Because the tutorials were to do with writing applications, and probably did not address the hoary issue of data validation.

The usual twaddle spouted is that the application should be "database independent" and therefore it must do all its own checking, so that it will work on oracle, sql server, mysql or any other RDBMS.

Also, the user experience of manually entering 20 fields on a form, only to be told at the end that there's something wrong, is very poor. It also consumes network resources to do round trips to the db, every time a field is populated. In this case, the front end form should do validation as-you-go, and then the db checking provides belt and braces. This only applies where there is manual data entry. For everything else, there is no user-editable form to worry about, so the db does the validation.

If the db does checking, and the form writer goofs, no problem, but the reverse is not true. The db must always be the final arbiter of truth and enforcer ofdata integrity.

Also, constrained fields can (in oracle at least) be detected by the optimiser, and where appropriate optimised out of queries, thereby improving performance.

This is trivial however, the main thing is to always do integrity and validation in the db, also do it in forms that users use to make edits, and nowhere else in the front end.

Regards

T
 
Like Thargy already explained, even model 2 is overdone because of the cities.countid

But it's not only an unneccessary field, it's even dangerous, as it can introduce inconsistencies like a city belonging to a state, but also to a country, to which that state does not belong to.

The query BigRed wrote on the model2 would make it easy to query data without joining other tables, but the data integrity is worth more than ease or performance of queries.

You would rather do

Code:
select cityid, cityname
from tblcities 
left join tblstates on tblstates.stateid = tblcities.stateid
where tblcities.stateid=23 and tblstates.countid=76

And even though filtering for country 76 would be unneccessary, if state 23 is part of that country, it would double check that and only yield results, if that is the case.

Another thought I'd add is, that you should not abbreviate country with count, the field name suggests it's some kind of count, even though the suffix id suggests it's some key field, it could be a key to some count table.

Like with any code or variables, rather be verbose. It's intellisense, that should shorten your writing, not abbreviations. you want to be able to get back to code written years ago by yourself or - even worse - by others and be able to understand it.

Bye, Olaf.
 
Thank you guys. But lets push this a little further. We have countries, states, cities. We know the countries and the cities but the states won't always be available. For example:
City -> Columbus
State -> Ohio
Country -> USA

City -> Columbus
State -> Wisconsin
Country -> USA

City -> Ottawa
State -> Ontario
Country -> Canada

City -> Ottawa
State -> Illinois
Country -> USA

City -> Madrid
State -> N/A
Country -> Spain

City -> Tokyo
State -> N/A
Country -> China

etc...

The state is not always required. It is when it makes sense to North American clients (states and provinces). A client could choose to go to England. What will be important is the country and the city. Another client could choose to go to Canada and then Ontario and then choose a city in Ontario.

This project is about choosing a college institution by country, by states (if available) or by city. Then the table institutions would be something like this :

tblInstitutions (I've added the tbl to make a distinction with a field)
institutionName
cityID (FK)
stateID(FK) (could be a N/A field)
countryID(FK)

And the other tables (countries, states, cities)

tblCountries
countID(PK), countName

tblSates
stateID(PK), stateName, countID(FK)

tblCities
cityID(PK), cityName, stateID(FK), countID(FK)

Considering this extra inforamtion, is this the best way of representing the relations between the tables?
 
Change the state table to be called 'GEOGRAPHIC_REGION' and have it reference a reference table called REGION_TYPE. This should contain the types of region e.g.
State (for USA),
province for Canada (I believe),
territory (for antipodeans)
County (for the UK),
Canton for Switzerland,
Departement for France
etc.
You should also have a 'not applicable' entry, e.g. for Liechtenstein and Luxemboug.

Your design needs a country_region table as each region belongs to 1 and only one country. Each city may belong to zero or 1 geographic regions, but by FK to the region, one can obtain the country. Luxembourg for example would have a region type of "Not applicable" but there would still be an entry in the geographic region table for luxembourg, which would just be a place holder for the foreign key to country.



Regards

T
 
City, Region, Postal_Code, Country is in fact how the Customers table in the Northwind database is set up. They have NULL's for places without a region.
 
Well, the northwind database is not a very good model. Besides that the customer table is having a country field, because there is no countries table. The same goes for other fields, therefor the customer's adress is simply stored with the customer althought the entities of country, state, city are in a hierarchical relation to each other. That's simply not reflected in the northwind database.

In regard to the problem of countries with no state I'd also recommend a placeholder like "not applicable". In a strict hierarchical structure of countries->states->cities you can't have a NULL for the stateID in the cities table, as there is no countryID in the cities table, therefore you'll have that single replacement for the missing states to be able to connect to the country.

It's not a good idea to let cities have a countryID for the countries in which there are no states just to spare one record. This will introduce the possibility of redundancy and contradictionary relations, which a database model should prevent by it's design alone, not with additional rules like you could set and apply: Only one of the ID fields state or country of a cities table having both foreign keys should be set to a valid ID and the other remain NULL.

That's a way to save one record for countries with no states or similar hierarchical entities. For the cost of rather complex SQL you'd later need to query data.

Bye, Olaf.
 
how come almost all the tutorials I've read on the web (PHP and MySQL) have never mentioned this

Beacause they were written by application programmers and not database specialists.

But trust us, any database specialist will tell you of the problems caused (that are VERY hard to fix) by this poor practice of enforcing things only in the application. The database will probably outlive the application by many years, this stuff must be enforced in the database.

"NOTHING is more important in a database than integrity." ESquared
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top