1alexsasha1
Technical User
Dear All,
I am developing a world energy database. All elements in the structure seem logical but one. I had to introduce two artificial columns to distinguish between sources, from which data come (I try to keep all the data in the raw format). This is how my table looks like (total of 8 columns, * stands for key column):
[Year]* – [Country]* – [Product, eg oil]* – [Value (eg how much of oil)] – [Units (eg barrels)] – [Source (data can come from different sources)] – [Source number]*- [Yes/no (put yes to use data from a particular source)].
Take for example US oil consumption. If the US government reports that in 2003 consumption of oil in the US was 20033 thousands of barrels per day. In my table I make a record:
[2003] – [US] – [oil consumption] – [20033] – [000 barrels per day] – [US government] – [(for example) 1] – [Yes]
I introduced “source number” column to be able to distinguish between the same data points that come from different sources, e.g. US oil consumption could be reported by the US government and by the International Energy Agency differently. The “Yes/no” column is there to be able to easily recover only those data which I trust to be correct, so I put yes for oil consumption reported by the US government and no for IEA number.
The intuition tells me that these two columns could be a source of potential problems. Would be great to know if someone else had to overcome non-uniqueness problem in their database design and what are the best practices.
Many thanks
Alex
I am developing a world energy database. All elements in the structure seem logical but one. I had to introduce two artificial columns to distinguish between sources, from which data come (I try to keep all the data in the raw format). This is how my table looks like (total of 8 columns, * stands for key column):
[Year]* – [Country]* – [Product, eg oil]* – [Value (eg how much of oil)] – [Units (eg barrels)] – [Source (data can come from different sources)] – [Source number]*- [Yes/no (put yes to use data from a particular source)].
Take for example US oil consumption. If the US government reports that in 2003 consumption of oil in the US was 20033 thousands of barrels per day. In my table I make a record:
[2003] – [US] – [oil consumption] – [20033] – [000 barrels per day] – [US government] – [(for example) 1] – [Yes]
I introduced “source number” column to be able to distinguish between the same data points that come from different sources, e.g. US oil consumption could be reported by the US government and by the International Energy Agency differently. The “Yes/no” column is there to be able to easily recover only those data which I trust to be correct, so I put yes for oil consumption reported by the US government and no for IEA number.
The intuition tells me that these two columns could be a source of potential problems. Would be great to know if someone else had to overcome non-uniqueness problem in their database design and what are the best practices.
Many thanks
Alex