Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ENERGY database design and non-uniqueness 1

Status
Not open for further replies.

1alexsasha1

Technical User
Nov 13, 2006
3
GB
Dear All,

I am developing a world energy database. All elements in the structure seem logical but one. I had to introduce two artificial columns to distinguish between sources, from which data come (I try to keep all the data in the raw format). This is how my table looks like (total of 8 columns, * stands for key column):

[Year]* – [Country]* – [Product, eg oil]* – [Value (eg how much of oil)] – [Units (eg barrels)] – [Source (data can come from different sources)] – [Source number]*- [Yes/no (put yes to use data from a particular source)].

Take for example US oil consumption. If the US government reports that in 2003 consumption of oil in the US was 20033 thousands of barrels per day. In my table I make a record:
[2003] – [US] – [oil consumption] – [20033] – [000 barrels per day] – [US government] – [(for example) 1] – [Yes]

I introduced “source number” column to be able to distinguish between the same data points that come from different sources, e.g. US oil consumption could be reported by the US government and by the International Energy Agency differently. The “Yes/no” column is there to be able to easily recover only those data which I trust to be correct, so I put yes for oil consumption reported by the US government and no for IEA number.
The intuition tells me that these two columns could be a source of potential problems. Would be great to know if someone else had to overcome non-uniqueness problem in their database design and what are the best practices.

Many thanks
Alex
 
having both Source and Source Number is not normalized (assuming they are related)

move Source into its own table, with an autonumber PK, and use that number in the Source Number column as a related field (foreign key)

otherwise, what you have is fine

you might have some trickiness when trying to put together stats over several years if individual years are available from only one source or another, when there are also years in the range which have multiple sources...

... if that made sense ;-)

r937.com | rudy.ca
 
Thanks for your reply!
Indeed I am going to put together data over the few years and if I understood correctly you comment it should not cause non-uniqueness but might cause duplicate values? For this I thought of using this Yes/No column, to be able to retrieve datapoints from single, the most trusted source for a particular year.
 
uniqueness is provided by the compound primary key -- year, country, product, sourcenumber

what this means is that you don't want more than one set of data for the same year, country, product, sourcenumber

however, you can have multiple sets for the same year, country, product

the fact that one of them is Yes and the rest No is ancillary

note that ensuring that only one is Yes cannot be done by the table definition alone, it would have to be ensured by your app

r937.com | rudy.ca
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top