Nasty job on Ref. Integrity and Foreign keys

blom0344 · Jan 29, 2007

I have been asked to review a fairly normalized datawarehouse where a portion of the FK's are set to be nullable.
This of course has an impact on performance and it induces outer joins for correct reporting.
The issue is how to find a remedy and repair the missing FK values (either by adjusting the ETL process , setting proper defaults or building post-job procedures that fix things)
With starschema's this would be quite simple, cause there are 1-2 levels of dependencies, but a relational model is more complex.
I've played around with some scripts to seperate the tables in groups like:

No FK, only PK:

Code:

SELECT T.TABLE_NAME,S.CONSTRAINT_NAME,
MAX(CASE WHEN S.POSITION = 1 THEN S.COLUMN_NAME ELSE NULL END) AS COL1,
MAX(CASE WHEN S.POSITION = 2 THEN S.COLUMN_NAME ELSE NULL END) AS COL2,
MAX(CASE WHEN S.POSITION = 3 THEN S.COLUMN_NAME ELSE NULL END) AS COL3,
MAX(CASE WHEN S.POSITION = 4 THEN S.COLUMN_NAME ELSE NULL END) AS COL4,
MAX(CASE WHEN S.POSITION = 5 THEN S.COLUMN_NAME ELSE NULL END) AS COL5,
MAX(CASE WHEN S.POSITION = 6 THEN S.COLUMN_NAME ELSE NULL END) AS COL6,
MAX(CASE WHEN S.POSITION = 7 THEN S.COLUMN_NAME ELSE NULL END) AS COL7,
MAX(CASE WHEN S.POSITION = 8 THEN S.COLUMN_NAME ELSE NULL END) AS COL8,
MAX(CASE WHEN S.POSITION = 9 THEN S.COLUMN_NAME ELSE NULL END) AS COL9,
MAX(CASE WHEN S.POSITION = 10 THEN S.COLUMN_NAME ELSE NULL END) AS COL10
FROM SYS.ALL_TABLES T,SYS.all_CONS_COLUMNS S,SYS.ALL_CONSTRAINTS X 
WHERE T.OWNER = S.OWNER AND T.TABLE_NAME = S.TABLE_NAME AND
T.OWNER = X.OWNER AND T.TABLE_NAME = X.TABLE_NAME AND
T.OWNER = 'KIMO' AND T.TABLE_NAME LIKE 'TB%' AND X.CONSTRAINT_TYPE = 'P' AND
S.CONSTRAINT_NAME NOT LIKE 'SYS%' AND X.STATUS = 'ENABLED' AND
T.TABLE_NAME NOT IN 
(SELECT V.TABLE_NAME FROM SYS.ALL_CONSTRAINTS V WHERE V.OWNER = 'KIMO' AND V.CONSTRAINT_TYPE = 'R')
GROUP BY T.TABLE_NAME,S.CONSTRAINT_NAME

A 2nd group that directly references the first group:

Code:

SELECT DISTINCT D.TABLE_NAME FROM SYS.ALL_CONSTRAINTS D WHERE D.CONSTRAINT_TYPE = 'R'
AND D.OWNER = 'KIMO' AND
D.R_CONSTRAINT_NAME IN
(SELECT S.CONSTRAINT_NAME
FROM SYS.ALL_TABLES T,SYS.all_CONS_COLUMNS S,SYS.ALL_CONSTRAINTS X 
WHERE T.OWNER = S.OWNER AND T.TABLE_NAME = S.TABLE_NAME AND
T.OWNER = X.OWNER AND T.TABLE_NAME = X.TABLE_NAME AND
T.OWNER = 'KIMO' AND T.TABLE_NAME LIKE 'TB%' AND X.CONSTRAINT_TYPE = 'P' AND
S.CONSTRAINT_NAME NOT LIKE 'SYS%' AND
T.TABLE_NAME NOT IN 
(SELECT V.TABLE_NAME FROM SYS.ALL_CONSTRAINTS V WHERE V.OWNER = 'KIMO' AND V.CONSTRAINT_TYPE = 'R'))
INTERSECT
SELECT Y.TABLE_NAME FROM SYS.ALL_TAB_COLUMNS Y,SYS.ALL_CONSTRAINTS Z
WHERE Y.TABLE_NAME = Z.TABLE_NAME AND Z.CONSTRAINT_TYPE = 'R' AND
Y.TABLE_NAME LIKE 'TB%' AND Y.NULLABLE = 'Y'

and a 3rd group that is the rest.

Does anyone experience in bringing this sort of an exercise into manageable steps?

Ties Blom

carp · Jan 29, 2007

I think the following will give you all of the parent columns that are nullable:

Code:

SELECT t.owner, t.table_name, t.column_name
  FROM dba_constraints c, dba_cons_columns t 
 WHERE c.constraint_type = 'R' 
  AND c.r_constraint_name = t.constraint_name
  AND c.r_owner = t.owner
MINUS
SELECT owner, table_name, column_name
FROM dba_tab_columns 
WHERE nullable = 'N'
;

From here, you can find all FK constraints that reference the owner/table_name/column_name combinations returned by the above query.

blom0344 · Jan 29, 2007

Well, it is not that simple.
Your script will invariably not return any output since every table will have at least one nullable column.

I think the real challenge is to find a strategy that will give the proper sequence in fixing the FK's.

Given the datamodel on paper it would be solvable by hand (brute-force method), but I was hoping I could engineer it from the data-dictionary.

Ties Blom

carp · Jan 30, 2007

An interesting assertion, given that I DO get output from the query. Perhaps we should look at the script a little more closely and see what's really going on.

As I understand it, you are looking for foreign key constraints whose parent columns allow NULL values. Accordingly, the first query in my code finds all of the parent columns. The second query finds all columns that disallow NULL values.
The first query's results MINUS the second query's results yields all parents in a FK relationship that allow NULL values.
A more intuitive (and probably faster executing) alternative would be to change the MINUS to INTERSECT and 'N' to 'Y'. This would give you the intersect of all parent columns and all NULLable columns.

If I have correctly stated your requirements, it IS that simple. On the other hand, if I have misunderstood your requirements, then please correct me where I have misinterpreted your needs and we can try again.

carp · Jan 30, 2007

Rather than rely on assertion/counterassertion, let's use SantaMufasa's excellent observation:"An ounce of experiment is worth a pound of expert opinion".

Code:

SQL> create table x(y varchar2(5) unique);
Table created.

SQL> create table a(b varchar2(5) references x(y));
Table created.

Note that a foreign key can only reference a column with a primary key (which allows no NULLs) or unique constraint (which DOES allow NULLs) - this is why I specified a unique constraint.
Now, run the query:

Code:

SQL> SELECT t.owner, t.table_name, t.column_name
       FROM dba_constraints c, dba_cons_columns t
      WHERE c.constraint_type = 'R'
        AND c.r_constraint_name = t.constraint_name
        AND c.r_owner = t.owner
MINUS
     SELECT owner, table_name, column_name
       FROM dba_tab_columns
      WHERE nullable = 'N'
;

OWNER     TABLE_NAME          COLUMN_NAME
--------  ----------------    --------------
DMPPADMN  X                   Y

Note that this correctly shows a user/table/column combination for a parent column in a FK relationship that allows NULL values.

Please let us know if this helps or if we need to pursue this farther.

carp · Jan 30, 2007

OK, in rereading your original post, it looks like I DID misinterpret your requirements. I am not sure how you would determine what the value in an FK's child field would be, since (theoretically) the value would be the only thing that relates the child row to the parent row. However, the following approach might help you at least find the tables/columns that need your attention.

First, find all of the table/columns that are both nullable and function as children in a FK relationship. Then compare the number of rows in the table to the number of populated fields in the child column. Tables/columns that show some rows that need fixing are the ones that require your attention.

The following will generate a SQL query for each nullable column that is a child in a FK constraint:

Code:

SELECT 'SELECT '''||v.table_name||'.'||v.column_name||''' table_column, count(*) - count('||v.column_name||') rows_to_fix FROM '||v.owner||'.'||v.table_name||';'
FROM 
(SELECT cc.owner, cc.table_name, cc.column_name 
  FROM all_cons_columns cc, all_constraints c, all_tab_columns t
 WHERE c.owner = cc.owner
   AND c.constraint_name = cc.constraint_name
   AND c.constraint_type = 'R'
   AND t.owner = cc.owner
   AND t.table_name = cc.table_name
   AND t.column_name = cc.column_name
   AND t.nullable = 'Y') v;

Now, if you spool the results of this query into a file, you can then run the file to get the assessment results. When I run this on one of our databases, I get the following results:

Code:

TABLE_COLUMN            ROWS_TO_FIX
----------------------- -----------
CR_MEMO_PROB.EDI_SEQ_ID      220876


TABLE_COLUMN                       ROWS_TO_FIX
---------------------------------- -----------
CR_MEMO_PROB.PROBLEM_LOOKUP_SEQ_ID           0


TABLE_COLUMN              ROWS_TO_FIX
------------------------- -----------
CR_MEMO_REQ.Q_BILL_TO_NUM        3680

From this, I see that two columns have foreign key values that are set to NULL.
Again, I have no idea as to how you will determine the correct value to replace the NULLs with (unless you create a dummy parent and set them all to that).
Are we getting any closer to what you need?

blom0344 · Jan 30, 2007

Let me elaborate on the requirements.

We have basically 2 issues with Referential Integrity:

1. ETL-jobs running into FK errors cause the ETL process tries to insert a row with a FK that does not exist in the referenced table.

This we can tackle by adding a procedure that scans the data to be inserted and flags rows that will cause such an error (instigating repair-actions in the source and restart the loading process)

This problem I am not concerned with.

2. Consider a facttable that references a ref-table through a foreign key. However the FK is set to nullable, so we have a situation that allows for rows to be missed if I would run a query on these 2 tables using an inner join.
Consequently, we now use outer-joins and pay a performance price.
The idea is to not allow null values for the FK. So, instead of allowing null we substitute a dummy. consequently we need that dummy value in the referenced table to allow working with inner joins.

Referential integrity forces us to work in one direction.
We need to create the dummy in the referenced table first.

Based on that scenario I created the first datadictionary query that nets all the periferral tables (those that contain no FK's, but that may be referenced by other tables) These would be candidates to be fixed first (in their respective ETL procedures) (=SET1)

The next batch would be the set of tables that directly reference the SET1. The 2nd query should net that batch.

I have now labelled a third set (SET3) that contains all those tables that are not referenced by other tables:

Code:

SELECT DISTINCT X.TABLE_NAME,
X.CONSTRAINT_NAME
FROM SYS.ALL_CONSTRAINTS X WHERE X.CONSTRAINT_NAME
LIKE '%PK%' AND X.TABLE_NAME LIKE 'TB%'
AND X.OWNER = <USER> AND X.CONSTRAINT_NAME NOT IN (
SELECT V.R_CONSTRAINT_NAME FROM SYS.ALL_CONSTRAINTS V WHERE V.CONSTRAINT_TYPE = 'R' AND V.TABLE_NAME LIKE 'TB%'
AND V.OWNER = <USER>)

In most cases these will be the 'pure' facttables and they need no dummy entry. All they need is a default value for the nullable FK's.

Now I have 3 sets of tables identified and probably all tables covered (except for structures beyond snowflake)

My main concern is whether this makes sense as a strategy for a step by step solution.

Ties Blom

carp · Jan 31, 2007

OK, thank-you for the clarification.
Based on your stated requirements, this seems like a reasonable enough approach to me. To summarize:
1. Find all tables/columns who are parents to FKs and whose children can have NULL values.
2. Create a dummy record that will be the parent for the NULL child values.
3. Update the NULL children to reference the appropriate dummy value.

Once this is done, you might also try to get NOT NULL constraints on the child columns so you can avoid this situation in the future.

blom0344 · Jan 31, 2007

Carp,

Thanks for putting in the time. This is a pretty nasty looking job (that's why it is still lying around to be fixed) and I was wondering if I was on the right track.

Ties Blom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Nasty job on Ref. Integrity and Foreign keys

blom0344

Technical User

carp

MIS

blom0344

Technical User

carp

MIS

carp

MIS

carp

MIS

blom0344

Technical User

carp

MIS

blom0344

Technical User

Similar threads

Part and Inventory Search

Sponsor