Database normalisation and structure

jpadie · Feb 28, 2005

I need some advice on table structure. I’m designing for a mysql back-end but there is no requirement that this be so. Thus if I have posted in the wrong group please let me know!
I am designing a contract management system for in-house legal departments. Requirements include that
* contracts are categorised with a type (i.e. NDA’s, software licences etc)
* each contract type has a different set of meta-data but there will be certain #
* meta-data-points that will be common to all contract types
* contracts need to be searchable across contract types (by entry of keywords rather than queries on specific fields).
This leads me to a design decision: should I
1. input ALL contract types to the same table and just make sure that there are a great number of fields that may be redundant for some contract types but not others (i.e. reuse varchars/dates etc and dynamically plot their labels for each contract type)? In this model there would be an innate limitation on how many fields could actually be used for each template (unless I build table modifiers into each template).
2.. Have one table that stores the common information and then a further table for each contract type, and use some javascript or multiple page interaction on the front-end to display the right field types to the user
3. Concatenate all the field and values and insert them into a text field in a separate table and conduct full text searches against this field?

Even as I type this I’m leaning more heavily towards the second option. However, I’m certain that this structural problem has been met and overcome countless times before and I’d be grateful for any views or insights that you may have.
Thanks
Justin

azzazzello · Feb 28, 2005

I think, actually, option 1 is better. Either way, you will have to have intelligence regarding what fields (or tables) to reference built into code. Might as well have less tables to maintain. Basically, do something like

record_type = get_record_type(id);

IF (record_type = type_1)
{
call function pull_fields_for_type_1(id);
process records
}
ELSIF (recird_type = type_2)
{
....
}

I have seen it implemented this way several times. Can't say it's very clean, and you are right - there will be a lot of fields. But if you keep the code logic clean, you should be fine

r937 · Feb 28, 2005

the first or second options would be on the right track, except that you want to make sure that you do not generalize (or "reuse") any column for more than one purpose

so there should be a common table

now the only decision is whether to have separate tables (one per contract type) to hold the separate data fields that are unique to each type, or whether to combine them all into the common table (note: all unique columns would be separately represented)

my own preference is to combine everything into one table, unless there is a good reason to split something out into a separate table (and "because it seems to make more sense" is not a good enough reason)

this means that on any given row in the table, you will see a lot of nulls (i.e. for all the unique columns that do not pertain to the contract type that this particular row represents)

if mysql supported views, i would talk about how you could use a different view for each different contract type

for more information on this design structure, do a google for "subtype/supertype" in the context of data modelling

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)

jpadie · Mar 1, 2005

Thanks very much to both of you. useful insights.

i'm going with the one table model which represents your joint recommendation.

as a vague hybrid (and because i need to get a demo up quickly) i'm going to bung all the "bespoke" information (i.e. the non-overlapping fields) into a single text field (ini or xml style) and parse as and when needed. it will also be full text indexed. i suspect that this will get clunky quickly and i will end up moving to a large number of columns but I am interested to see whether the first solution works in terms of user experience and at what point the overhead gets too much (i guess there will be a max of 100,000 records).

thank you again.
Justin

azzazzello · Mar 1, 2005

It will get cluncky very, very quickly

I recommend adding columns right away. I do not think having more columns will add any overhead to your query time, regardless of how many records you have. I mean, the number of rows will certainly influence query speed, but the number of columns does not affect it. Do a quick test to prove this to yourself. Create a table with 3 columns and "for 1 to 100,000 { enter record }" do query. Then add 50 more columns. Do same query, or even a query containing more of those columns as conditions. I don't think it will come back any slower

r937 · Mar 1, 2005

i recommend never "embedding" data into a text field

fulltext indexing isn't going to retrieve it efficiently, and of course indexes on it will be ignored, and you will forever be doing table scans

besides, columns with NULLS in them take no space (assuming they're not defined CHAR), so all your different type of rows will take no more space than if they were in separate tables anyway, but now you have the benefit of being able to index them

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)

jpadie · Mar 2, 2005

well ... i'm a consensus kind of chap and given that my idea has been so roundly slammed (!!) i'll follow your collective advice to the letter!

many thanks for your joint assistance!

Justin

jpadie · Mar 2, 2005

i've just tried an "Alter table" command on a db with some dummy data - (about 1gig). although i don't assume that this will have to be run very often, it won't be a dbadmin task - more of a client power user.

mysql appears to want to write the whole table into a temporary table, delete the original, recreate with a new schema and then write back.

the alter table just took too long to be viable. although i can see that select queries can be run against the table whilst it is in alter mode (although it kills the processor) no updates or anything can be done.

This is going to be a difficult sell!

unless i am mistaken that alter table is the right command for this, i guess i'm stuck back with option 2!

r937 · Mar 2, 2005

what's there to alter? you said (more than once) that you are "designing for"

this sort of implies that you don't have those tables yet, so i don't understand what you're altering

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)

jpadie · Mar 2, 2005

sorry, i perhaps wasn't clear.

on day 1 it is not certain what new contract types will be needed. some can be guessed at.

thus on any day, the app must be able to add a new contract type with a new set of meta data.

also, on any day, the current set of meta data for any contract type must be able to be added to (but unlikely ever to be deleted).

r937 · Mar 2, 2005

ah, i get it

in a given year, how often do you come across a new contact type?

yes, you would have to alter the table when it happens, but only to add new columns that none of the other previously identified types have ever had

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)

jpadie · Mar 2, 2005

in a typical mid-size company i'd expect one new type per month for the first two years then one new type every six months.

i would expect more changes to the meta-data though.

the main size issue comes from storing the text of each document in the contracts table. i guess there is no reason why i can't boot this column off to a separate table.

r937 · Mar 2, 2005

there's no reason why "the text of the document" wouldn't be one of the common columns, either, so you should never have to alter the table for that

if it's one of the TEXT or BLOB datatypes, then it will already be off "somewhere else" (mysql stores only 4 bytes for each of these fields in the actual table)

rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts March 6 2005)

jpadie · Mar 2, 2005

thanks. curious why the alter table was taking so long then. maybe just running out of memory.

next step is to slot some more ram in and see whether it is just a hardware issue.

many thanks for your advice.

justin

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Database normalisation and structure

jpadie

Technical User

azzazzello

Technical User

r937

Technical User

jpadie

Technical User

azzazzello

Technical User

r937

Technical User

jpadie

Technical User

jpadie

Technical User

r937

Technical User

jpadie

Technical User

r937

Technical User

jpadie

Technical User

r937

Technical User

jpadie

Technical User

Similar threads

Part and Inventory Search

Sponsor