Another reason why Teradata does not work well in normalized environment....... Maybe Star Schemas work better ????
When executing a query containing equijoins, Teradata will redistribute spool files based upon join conditions. If you are joining a very large fact table to a smaller attribute table, this redistribution is based upon the number of distinct values in the join clause. For example, if I have customer transactions that are being joined to customer, then the spool file that has been created off the customer transaction file result spool of the previous step is redistributed by customer. If you have a lot of customers in the result spool, then it is likely redistribution will be very good. If you only have a few customers in the result spool, redistribution could be a problem. To further cause pain, if your customers have a customer type, or customer region, the redistribution could further squeeze the result spool into even less amps. This methodology caused Teradata to have spool problems much too frequently. If the customer type or region was denormalized into the customer table (bring region description into every customer record) then the redistribution would always occur based up customers, not the lesser distinct region or type. Spool is less likely to occur.
Developers writing SQL for Teradata are constantly fighting this spool issue.
A possible solution? Teradata needs to recognize when its redistribution of the spool file is too narrow(heavily skewed), causing out of spool conditions. By leaving the larger spool file distributed the way it is, and redistributing the smaller table (possibly forcing redundant data across amps), the spool files distribution remains very even. Can this be done based upon statistics collected? I am not sure.
When executing a query containing equijoins, Teradata will redistribute spool files based upon join conditions. If you are joining a very large fact table to a smaller attribute table, this redistribution is based upon the number of distinct values in the join clause. For example, if I have customer transactions that are being joined to customer, then the spool file that has been created off the customer transaction file result spool of the previous step is redistributed by customer. If you have a lot of customers in the result spool, then it is likely redistribution will be very good. If you only have a few customers in the result spool, redistribution could be a problem. To further cause pain, if your customers have a customer type, or customer region, the redistribution could further squeeze the result spool into even less amps. This methodology caused Teradata to have spool problems much too frequently. If the customer type or region was denormalized into the customer table (bring region description into every customer record) then the redistribution would always occur based up customers, not the lesser distinct region or type. Spool is less likely to occur.
Developers writing SQL for Teradata are constantly fighting this spool issue.
A possible solution? Teradata needs to recognize when its redistribution of the spool file is too narrow(heavily skewed), causing out of spool conditions. By leaving the larger spool file distributed the way it is, and redistributing the smaller table (possibly forcing redundant data across amps), the spool files distribution remains very even. Can this be done based upon statistics collected? I am not sure.