Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Normalizing the following data

Status
Not open for further replies.

jtrembla

ISP
Jul 6, 2006
104
0
0
US
How would one go about normalizing the following data, cause obviously there are many different ways to do so.

I clearly have to create a CountryTable a StateProv Table, do I create a PostalCode table for all the unique postal codes?


CountryName PostalCode CityName StateProvinceName StateProvinceAbbr TimeZone UTC Latitude Longitude
USA 10121 Manhattan New York NY Eastern -5.0 40.750095 -73.998867
USA 10121 New York New York NY Eastern -5.0 40.750095 -73.998867
USA 10121 Nyc New York NY Eastern -5.0 40.750095 -73.998867
USA 33155 Miami Florida FL Eastern -5.0 25.737315 -80.309137
USA 33155 South Miami Florida FL Eastern -5.0 25.737315 -80.309137
USA 33155 West Miami Florida FL Eastern -5.0 25.737315 -80.309137
USA 48208 Detroit Michigan MI Eastern -5.0 42.350968 -83.089896
USA 60619 Chicago Illinois IL Central -6.0 41.745767 -87.607643
USA 60619 Grand Crossing Illinois IL Central -6.0 41.745767 -87.607643
USA 77059 Houston Texas TX Central -6.0 29.608785 -95.114041
USA 84121 Brighton Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Cottonwd Hgts Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Cottonwood Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Cottonwood Heights Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Cottonwood Heights City Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Holladay Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Holladay Cottonwood Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Murray Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Salt Lake City Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Slc Utah UT Mountain -7.0 40.622101 -111.82424
USA 84121 Solitude Utah UT Mountain -7.0 40.622101 -111.82424
USA 90064 Los Angeles California CA Pacific -8.0 34.035989 -118.42262
USA 90064 Rancho Park California CA Pacific -8.0 34.035989 -118.42262
USA 94102 San Francisco California CA Pacific -8.0 37.780793 -122.417576
CAN E3A 1A4 Fredericton New Brunswick NB Atlantic -4.0 45.97771 -66.67843
CAN H1E 6M9 Montreal Quebec QC Eastern -5.0 45.64171 -73.57724
CAN K1K 2K3 Ottawa Ontario ON Eastern -5.0 45.43388 -75.63524
CAN M4M 1X2 Toronto Ontario ON Eastern -5.0 43.66572 -79.34655
CAN V5N 1X1 Vancouver British Columbia BC Pacific -8.0 49.26227 -123.05605
CAN V6Y 4B2 Richmond British Columbia BC Pacific -8.0 49.16766 -123.13376
CAN V7Y 1J6 Vancouver British Columbia BC Pacific -8.0 49.28316 -123.11891
 
Looking at your data, this isn't a list of contacts. It looks like this data was obtained from a source for geography reference. The data is probably already cleansed, and I wouldn't worry about normalizing it unless this is a table in which you have a system which modifies the data.
 
Hi,

Please find below the best normalized form possible for your table.

Country Table:
country_id, countryname

City Table:
city_id, cityname

Country-City Mapping Table
mapping_id, country_id, city_id

Province Table:
Province_Id, StateProvince, StateProvinceAbbr, PostalCode, Latitude, Longitude

Timezone Table:
Timezone_id, Timezone, UTC

Mapping Table:
mapping_id, province_id,timezone_id

I believe u will be able to retrieve everything with the above structure.

Thanks,
AP
 
While AP has fully broken everything out, 99% of the time breaking the data out to this degree will degrade system performance. Unless you are going to store additional data about a value there isn't really any point to breaking the data apart into its own table.

I would break out the State and TimeZone data into separate tables.

Code:
CREATE TABLE State
(StateCode CHAR(2),
StateName VARCHAR(50))
CREATE TABLE TimeZone
(TimeZoneId INT IDENTITY(1,1),
TimeZoneName VARCHAR(50),
UtcOffset INT)

I also recommend storing the UtcOffset in minutes as there are timezones which are done at the half hour.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
Country-City Mapping Table
mapping_id, country_id, city_id
in a word, no


:)

first of all, a many-to-many table does not require its own (presumably identity) key

secondly, when was the last time you heard of a city belonging to two different countries?

sometimes people carry normalization to silly extremees


r937.com | rudy.ca
 
TimeZone data is the only thing I would break out. It's the only piece of data in the sample shown that could possibly change. City, State, Zip, Long, Lat, and Country are all static. A city may change its name, but you're still looking at only 1 record. It's not going to change country(unless you're going to deal with some of those African and Eastern Europe regions that seem to change weekly) and no matter what, it's Long and Lat will never change. A Time Zone may change (if you're calculating daylight savings) and will need to update all affected locations which could be done with one change in a TZ table. I would only break apart those items that can change and shift what they belong to. Otherwise K.I.S.S. is the standing rule.

"If I were to wake up with my head sewn to the carpet, I wouldn't be more surprised than I am right now.
 
Latitude and Longitude could change. In this case, it looks as though they depend on the Postal Code. I used to get monthly feeds of current US postal code/metro/county/city/state information like this. Many regions in the country which are growing are splitting up postal codes into new ones which could affect the latitude and longitude.

Normalizing the time zone might be fairly difficult because you have cases where some parts of Indiana seem to change time zone or observed time fairly frequently.

You could normalize state, simply so you don't waste space storing the name and the abbreviation in every record, similar to Denny's table suggestion.
 
It is amazing how many different aproaches there are.

I did some test and this seems to be the best way

TABLE Country
COLUMN CountryID PK
COLUMN CountryName

TABLE StateProvince
COLUMN StateProvinceID PK
COLUMN StateProvinceName
COLUMN StateProvinceAbbr

TABLE TimeZone
COLUMN TimeZoneID PK
COLUMN TimeZoneName
COLUMN TimeZoneUTC

TABLE PostalCode
COLUMN PostalCodeID PK
COLUMN PostalCode
COLUMN Latitude
COLUMN Longitude
COLUMN CountryID FK
COLUMN StateProvinceID FK
COLUMN CityName
COLUMN TimeZoneID FK

i will have to work on updating the data as it comes in monthly. Unless someone knows of a good web service.

Also, I will be doing radius distance calculations, input of zipcode and distance from.

I was thinking the fastest way to handle this (instead of on the fly) pre-generate and index all valid combinations. For example:

TABLE PostalCodeDistance
COLUMN StartPostalCodeID FK
COLUMN DestinationPostalCodeID FK
COLUMN Distance

And create the following clustered index...

StartPostalCodeID ASC
Distance ASC

Thoughts?
 
I agree on the pregenerate the distance, it will save alot of time for the users. You will have to regenerate it once a month when you receive new data though as zipcodes will change.

If regenerating the distance data once a month is very time-consuming (as I suspect it will be) you can use a sneaky method we use so that the users don't notice the difference. We set up two tables with the information and a view that references one. Then we leave the liveone (the one the view uses) alone and regenerate onthe second table. When this is done, we alter the view to reference the second table. All queries will need to reference the view. This way the users aren't blocked out while the data is regenerated.

"NOTHING is more important in a database than integrity." ESquared
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top