Hello, I am building a data warehouse that will house information on communities around the globe. The database warehouse will initially start with U.S. data and then expand to global markets. It will contain a central dimension table with a list of cities and the corresponding states, metros (cbsas), counties, countries that they are within 2) individual dimension tables listing the elements of each geography type (for example - one with a list of cities, one with a list of states, one with a list of counties, etc) to be used for queries that are not city-based but for the other geography levels 3) various data fact tables housing demographics of various sorts (population, income levels, education levels, industry and occupation data), tax data, companies that are located in each city, schools in each city, anecdotal facts on each city.
The outputs will be tables comparing markets based on different combinations of factors, reports on individual markets (again showing different combinations of factors depending on the requirement).
The thing I am stuck on is table format (wide vs long). Do I structure these tables long for example? In a demographic table, there could be a hundred demographic variables for each city, each metro, each state, each county. Is that the most efficient way to do it. I picture tables with millions of records (and one data point) if we use long format. I have built a database in the past like this using wide format tables, but it was slow (ms access). The newer version will be in SQL, MySQL, or Snowflake (tbd) and will hopefully be faster.
I am not an expert in these technologies. What things do I need to take into consideration? Any advice would be appreciated.
The outputs will be tables comparing markets based on different combinations of factors, reports on individual markets (again showing different combinations of factors depending on the requirement).
The thing I am stuck on is table format (wide vs long). Do I structure these tables long for example? In a demographic table, there could be a hundred demographic variables for each city, each metro, each state, each county. Is that the most efficient way to do it. I picture tables with millions of records (and one data point) if we use long format. I have built a database in the past like this using wide format tables, but it was slow (ms access). The newer version will be in SQL, MySQL, or Snowflake (tbd) and will hopefully be faster.
I am not an expert in these technologies. What things do I need to take into consideration? Any advice would be appreciated.