Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Another reason why Teradata does no

Status
Not open for further replies.

bigscott

MIS
Aug 27, 2003
5
US
Another reason why Teradata does not work well in normalized environment....... Maybe Star Schemas work better ????

When executing a query containing equijoins, Teradata will redistribute spool files based upon join conditions. If you are joining a very large fact table to a smaller attribute table, this redistribution is based upon the number of distinct values in the join clause. For example, if I have customer transactions that are being joined to customer, then the spool file that has been created off the customer transaction file result spool of the previous step is redistributed by customer. If you have a lot of customers in the result spool, then it is likely redistribution will be very good. If you only have a few customers in the result spool, redistribution could be a problem. To further cause pain, if your customers have a customer type, or customer region, the redistribution could further squeeze the result spool into even less amps. This methodology caused Teradata to have spool problems much too frequently. If the customer type or region was denormalized into the customer table (bring region description into every customer record) then the redistribution would always occur based up customers, not the lesser distinct region or type. Spool is less likely to occur.
Developers writing SQL for Teradata are constantly fighting this spool issue.

A possible solution? Teradata needs to recognize when its redistribution of the spool file is too narrow(heavily skewed), causing out of spool conditions. By leaving the larger spool file distributed the way it is, and redistributing the smaller table (possibly forcing redundant data across amps), the spool files distribution remains very even. Can this be done based upon statistics collected? I am not sure.
 
If the optimizer knows about a possibly skewed spool, it's [sometimes] producing a different plan avoiding it.

This can only happen when there are statistics.

If there are statistics and the optimizer is still producing a skewed plan, it's hard to convince the optimizer to do something else, because there are no hints.
Using Derived Tables was a possibility, because they used to be materialized, but in V2R5 this no longer works.

So if you're really shure taht the plan is bad and could be much better, you're supposed to open an incident...

Dieter
 
I agree that this can be a problem, but many times it is user initiated. When returning rows a user will put the column they feel is most prominent to their results in the first column (the default PI) instead of the most widely distributed. For example even if I am returning millions of rows, if then first column has fewer than about twice the number of AMPs, there will be skewing.

Teach your users to put a widely distriuted feild in the first column, they can rearrnge them before results are produced.

How many AMPs do you have? If this is consistantly a problem , consider reducing them.
 
Thanks for the feedback. We just moved from V2R4 to V2R5. In using a BI tool in front of Teradata, sometimes there is not as much control on the SQL side, you go with the generated SQL. But some of my point is why should an SQL person have to know about redistribution within a query? SQL was originally made to generate information from a common language, but we have made these users have to understand what is happening underneath. I say that Teradata needs to resolve this so it doesn't force a user to know if they have to create a derived table when logically one is not necessary.
Sure, I can go to the DBA's to open an incidence, and they will say just rewrite the query. And who knows if Teradata will resolve the issue. I consider this to be a real weakness in Teradata.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top