Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DB Guidelines

Status
Not open for further replies.

Andrzejek

Programmer
Jan 10, 2006
8,509
US
I posted this thread1662-1772738 in Oracle forum, but that could be applied for any relational data base. So maybe this forum would be a good place to (cross) post it, too.

I would appreciate any input on the issue.

If I get some ideas here, I will mention this thread in Oracle thread, too.

Have fun.

---- Andy

There is a great need for a sarcasm font.
 
Andy said:
Every table has to follow 1st, 2nd, and 3rd Normal Form
That's fine. Full stop.

But, conditionally I design alternative relations, eg in a recipe an ingredient might be a simple ingredient or another recipe. You can't design this in one ingredientID sometimes relating the recipeingredient to a table of base ingredients and sometimes relating the recipeingredient to the recipes table, I designed two foreign keys baseingredientID and recipeID, for which you define an additional constraint to have only one of these IDs set, the other NULL.

Andy said:
Every table has to have a [highlight #FCE94F]numeric[/highlight] Primary Key
You can do very fine with a uniqueidentifier or however such a globally unique alphanumeric (hex or binary) value is called in Oracle, too. Very good for adding replication later and needing no master database node to create a unique number sequence, when going for a distributed database approach later. What's surely a must have rule is a primary key, and also a single field primary key. When it came to the foreign key rule Bill misunderstood the concept, the important part is having a single field primary key, so any foreign key also is just that one field in the foreign table, which might have many foreign keys to make as many relations to other data as necessary. To only allow unique names in tables like producttypes just listing a list of product type names you can use a normal unique constraint, no natural primary keys, surrogate keys is the short term for this rule.

One more rule to apply is avoidance of circular references. I can give you an example I had to deal with, again in the domain of recipes. We have recipes, which will be cooked, then samples are taken of this and assigned to tests, the tests are assigned to the recipes and the samples are assigned to the tests, so they are an end point of a 1:n:m three level hierarchy. So far, so easy, but then the samples are a portion of a recipe and have to have a unique sample number for their identification, also you can do many tests with the same recipe and that makes a sample-recipe relation necessary to check the sample number seqence fo uniqueness per recipeID. The numbers had to remain small, as they are part of a EAN13 barcode and can't rise higher than 99, so there is a sample number sequence resetting to 1 per recipeID.

In this case I used a two field foreign key to intially assign samples to a recipe with one foreign key and then to a test via the contraint the (sample.testID,sample.recipeID) sample has to match the (test.ID, test.recipeID) tuple to avoid a situation the sample is assigned to a different recipe than the test. These two key contraints can happen. Looking precisely the sample:test relation is on the test primary key and test recipe foreign key the sample also has to match, so this constraint makes sense and I see no way to move all the foreign key triplets to a table defining just the relation between the three tables. I'd be glad to hear another idea.

And another thing I miss is talking about a naming convention. CamelCase or not? Underscores (this_is_a_field_name) or not? Prefixes? Suffixes? English? Avoiding key words, function names, etc?
For example, with a certain set of rules you can make names unique for the whole database, so a query with * will not cause trouble because of double field names. Such rules have their pros and cons, though, and on the other hanbd it's a good practice to avoid "lazy" query writing querying all fields via *. It has a pro of being future proof in getting all columns, all entity attributes, but it can cause data to leak out where it should not be visible (eg a grid showing all columns, including some new and confidential ones) or cause naming conflicts in joins, where there weren't any before.

That also means to define rules for querying the data and application design. Maybe a separate topic, but obviously not independent from the database rules.

Bye, Olaf.
 
I appreciate your input, Olaf.
A couple of points:
I hope not to have 'circular references' in the DB. All what I have to deal with is pretty much straight forward stuff. (so far....)
"naming convention. CamelCase or not? Underscores (this_is_a_field_name) or not? ..."
I know Access allows:
- CamelCase - in Oracle all names are UPPER_CASE
- Spaces in the table/field names - Oracle does not allow spaces, I use underscore instead

I actually like the 'Oracle' approach to these issues. :)

Have fun.

---- Andy

There is a great need for a sarcasm font.
 
Well,

in T-SQL and Access, too (I think) case doesn't matter at all, sql is case insensitive, and that's good, as three columns fieldname and FIELDNAME and FieldName are impossible, whereas some languages (eg C#) would make a difference in three variables named this way. You still can make a choice about doing so in all your source code, including DDL. Space vs underscores are just one aspect of naming conventions, any database will allow prefixes and suffixes, as it's a non technical matter, just what you decide to be part of the name, eg a good practice is to end all primary and foreign key columns in ID, some would give a USERS table a primary key USERID, others would name the primary key column ID only, as the full qualified name USERS.ID already says what ID this is and USERS.USERID is redundant. It also allows generic code to address any tables ID the same way, for example.

So this is not just about technical limitations and what the sql dialect itself allows or not.

By the way, you can have spaces in T-SQL enclosed in square brackets [FIELD NAME] or in double quotes "FIELD NAME", and the latter also qualifies for Oracle.

Bye, Olaf.
 
Andy,

I've taken the liberty of pasting your guidelines here, to avoid having to jump back and forward between the two threads.

[ul]
[li] Every table has to follow 1st, 2nd, and 3rd Normal Form
--- First Normal Form (1NF): A table is 1NF if every cell contains a single value, not a list of values. This properties is known as atomic. 1NF also prohibits repeating group of columns such as item1, item2,.., itemN. Instead, you should create another table using one-to-many relationship.
--- Second Normal Form (2NF): A table is 2NF, if it is 1NF and every non-key column is fully dependent on the primary key.
--- Third Normal Form (3NF): A table is 3NF, if it is 2NF and the non-key columns are independent of each others. In other words, the non-key columns are dependent on primary key, only on the primary key and nothing else.[/li]
[li] Every table has to have a numeric Primary Key[/li]
[li] Number of fields in a table should not exceed a ‘reasonable’ number (up to 20? 25?)[/li]
[li] Fields in a table have to have NOT NULL setting with default values (unless there is a very good reason not to. Example: comments field)[/li]
[li] A table has to relate to any other table only by one field (Foreign Key to Primary Key)[/li]
[li] Any piece of data has to be in one place only. No piece of data can be copied to two or more places / tables.[/li]
[li] There should be a standard established for naming any object: table, field (specific for PK, FK), index, trigger, sequence, check, etc.[/li]
[li] Names of objects should be as short as possible, but long enough to be readable.[/li]
[li] No calculated, concatenated (etc.) fields in a table.[/li]
[li] If a data element contains two or more components, they should be divided as much as possible. The data element can be rebuilt by concatenating the individual components.[/li]
[li] Identify the columns having an index – those having foreign keys, those used frequently in search conditions or joins, those with a large number of distinct values and those updated infrequently.[/li]
[li] Each table represents one object (entity) in the real-world systems it represents.[/li]
[/ul]


I'll add my comments in a separate post.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I think most of us would agree with most of your points - provided we remember to treat them as guidelines, not hard and fast rules. For example, I agree about primary keys being numeric, but I wouldn't necessarily condemn a database where the PKs are, say, character strings.

A couple of other points:

- Number of fields to be kept "reasonable". I would tend to be suspicious of a table that has 60, 80, 100 or more fields; it would suggest a need to normalise. But that's not necessarily the case. If there's a good reason for a high number of fields, then that's how it should be.

- "A table has to relate to any other table only by one field (Foreign Key to Primary Key)". If you mean that you can't have multiple fields making up a foreign key, then I'm not sure I see the reason for that.

- Naming conventions. Personally, I don't use a naming convention to distinguish different objects (fields, tables, triggers,etc). Maybe I should. But I do think it's important to follow certain conventions in the actual names themselves.

In particular, I try to choose table names that exactly describe the entity, always in the singular, and without abbreviating. So, customer, not cust, not cust_table, not customers. The reason is simply to reduce the effort of remembering the names. You never have to ask yourself if it is, say, the invoice or the invoices table; you always know.

The same with fields. So, in the invoice table, the FK pointing to the customer would be named customer, not cust_id, and certainly not FK_cust.

(I do stress that these are my personal preferences, not rules that I wish to impose on the world. That said, it is not often that I am in a situation where I have sole responsibility for choosing the names of tables and fields.)

Regarding PKs, my personal rule is to make them auto-incrementing integers. And I always name all of them simply ID. Again, this is just to make the names easier to remember. Since the PKs are probably the fields you will refer to most often in your code, this simple rule can save a lot of thinking and a lot of opportunity for error.

Apart from the above points, there's lot to like in your guidelines. Thanks for posting them.

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I appreciate the definitions and discussion. It's like auditing a class taught by guys in the industry: most interesting!

I came into all this by the back door some 30[sup]+[/sup] years ago, talking to the Oracle/DB2 DBA, doing things wrong and getting chewed out, bits and pieces along the way.

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
Hi Mike,

in regard to singular vs plural I work with databases having entities named with the plural. In english this is mainly the difference of appending s or not. Resharper (a very helpful tool for C# developers) has a configuration in regard of the entity framework to name entity classes/types with or without s suffix. It suggests both things are used, but I also think it contributes to databases not always having english table and column names and the simple pluralization rule doesn't hold for other languages, so turning off automatic plualization you avoid ridiculous entity/attribute names. At this point it may be an idea to use your first language - if that's not english. I have worked this way, too. But if a project has the potential to become international this is a good enough reason to use english names despite of their frequent conflicts with keywords of the programming languages involved. English is the IT language.

The singular may have a better readability in regard of addressing attributes of a single entity, eg the customer.name, on the other side you could also take the s as genitive s. You address the customer's name. You can give a double meaning to the s this way, pluralizing the entitiy for the table name and talking about attributes of an entity in the genitive.

In the end it's important you choose the same for all tables, so you don't have to think which has s and which doesn't.

Bye, Olaf.

 
As several also have addressed the rule about the reasonable number of columns. Technically important often is the size of a single record. Not sure about Oracle, but in T-SQL you have a reasoning of row-overflow - see
Taking this aside, it's true that entities with many attributes need many columns. You can think of how many of them you really will enter and also maintain for the whole database. This often becomes a "political" topic of who becomes repsonsible for data maintanance. This is a hard problem not only in smaller companies. Data you could enter in all detail, but which isn't maintained means outdated data, no technical problem, but a manpower problem. Where there is no real stakeholder to keep data with current information (eg think about security rules for chemicals) this could be a reson to reduce to entity attributes.

It can also be a reson to split the table in 1:0-1 related table sets with attributes you only seldom know outsourced into separate tables not necessarily having a row per row of the main table. That also avoids storing NULL, which is another topic on it's own.

Bye, Olaf.
 
Olaf, good points about pluralisation of entity names - especially with regard to non-English names. I stressed that my choice of singular rather than plural is a personal one - not one that I would seek to impose on others. The important thing is to make a decision and then to apply it consistently.

That said, any decisions about database design can never really be personal, as the design is likely to be used by many people - programmers, administrators, etc. - rather than the person who created the design.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike, thank you for copy/paste the guidelines here.
As you have noticed, these are just the guidelines and not the “hard and fast rules”. But in my opinion they should apply unless there is a very good reason not to. And by ‘reason’ I don’t mean an ‘excuse’.

"A table has to relate to any other table only by one field (Foreign Key to Primary Key)".
What I meant by this was not to have this situation:

[pre]
Select SomeFields
From TableA, TableB
Where TableA.FieldX = TableB.FieldX
And TableA.FieldY = TableB.FieldY
And TableA.FieldZ = TableB.FieldZ
And TableA.FieldW = TableB.FieldW
...[/pre]

As far as number of field in the table(s), Olaf has a good point with:
“It can also be a reason to split the table in 1:0-1 related table sets with attributes you only seldom know outsourced into separate tables not necessarily having a row per row of the main table. That also avoids storing NULL, which is another topic on its own.”

Just as a side note – those guidelines are driven by the DB I have to deal with right now. Pretty much none the guidelines listed are followed. It is really hard to program against it, it’s slow, with a lot of orphan records, and a lot of other problems. Those of you who deal with properly normalized relational data bases – imagine my work.


Have fun.

---- Andy

There is a great need for a sarcasm font.
 
Every table has to follow 1st, 2nd, and 3rd Normal Form
--- First Normal Form (1NF): A table is 1NF if every cell contains a single value, not a list of values. This properties is known as atomic. 1NF also prohibits repeating group of columns such as item1, item2,.., itemN. Instead, you should create another table using one-to-many relationship.
--- Second Normal Form (2NF): A table is 2NF, if it is 1NF and every non-key column is fully dependent on the primary key.
--- Third Normal Form (3NF): A table is 3NF, if it is 2NF and the non-key columns are independent of each others. In other words, the non-key columns are dependent on primary key, only on the primary key and nothing else
No argument with this one, though it can occasionally make sense to create a repeating group in a table.

Every table has to have a numeric Primary Key
It usually makes sense to have a single auto-generated PK column for a table, but there are exceptions. I don't usually bother to add the overhead of such a key with tables used only to express a many-to-many relationship between two other tables: i.e. it contains the PKs of the other two tables and nothing else. Just set the join table's two FKs in the join table to be its composite PK as well.

Number of fields in a table should not exceed a ‘reasonable’ number (up to 20? 25?)
Fields in a table have to have NOT NULL setting with default values (unless there is a very good reason not to. Example: comments field)
The number and nullability of columns in a table should reflect the entity represented. If an entity has 200 optional attributes, the table should probably have 200 nullable columns. I'd take a closer look at tables with a lot of columns to see if that design choice really is justified, but it certainly can be. I'm also wary of ruling that all columns must be "NOT NULL with default values", lest I end up with a lot of columns whose default value is "NONE" or "UNKNOWN" or some other inferior equivalent of NULL.

A table has to relate to any other table only by one field (Foreign Key to Primary Key)
If {almost} every table's PK is a single column, FKs will be by definition be single columns also.

Any piece of data has to be in one place only. No piece of data can be copied to two or more places / tables
Good as a general principle, but denormalisation is sometimes needed for performance reasons - particularly in data warehousing applications.

There should be a standard established for naming any object: table, field (specific for PK, FK), index, trigger, sequence, check, etc.
Names of objects should be as short as possible, but long enough to be readable.
Naming conventions should be designed to reduce unnecessary wear and tear on developers' brains - by which I mean that they should be, above all, consistent. So I don't think it matters (for example) whether tables are named singular or plural, provided the choice you make applies to all tables. Don't give people the overheadd of having to remember that the [tt]employee[/tt] table is singular but the [tt]customers[/tt] table is plural. Also, have a consistent rule for PK/FK column names. I've seen systems where every table's PK column is called ID, and FKs pointing to that table are called <table name>_ID. That's fine. I've also seen systems where a given key has the same name wherever it's used: person_id is the PK of the person(s) table, and also the name of any FKs pointing at it. That's fine too (and my personal preference). But don't have a set-up where it's sometimes Id and sometimes person_id and sometimes person_key, etc. My current workplace breaks both these rules, and it's really annoying to have to remember all that extra cruft.

Conventions should also be amenable to automation, for which reason I would avoid camel case. If you say that a table should be called purchaseOrder, the only way I can keep it that way is by manually remembering always to put that upper case O in there, and not using any of the tools that could automatically format my code. If it's called purchase_order, that ceases to be a problem.

No calculated, concatenated (etc.) fields in a table
Again, there are data warerhousing applications where this rule would routinely be broken, and virtual columns can be really useful to codify business logic.

If a data element contains two or more components, they should be divided as much as possible. The data element can be rebuilt by concatenating the individual components.
This is essentially restating your first rule - 1NF.

Identify the columns having an index – those having foreign keys, those used frequently in search conditions or joins, those with a large number of distinct values and those updated infrequently.
I'd limit that to "those used frequently in search conditions or joins." You don't want every table to be festooned with unnecessary indexes.

Each table represents one object (entity) in the real-world systems it represents.
Some will represent many-to-many relationships between two entities.


-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Chris,

You made a couple of good points about data warehouses.

Rightly or wrongly, I've always thought of a data warehouse - almost by definition - to be a denormalised version of a database, and is therefore not meant to obey many of these rules and guidelines. Does that make sense, or am I off-beam?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
There's more to datawarehouses, there are mainly star vs snowflake schema, where snowflake dimension tables are not denormalized, eg see There's more to that and it's a bit off topic, as the rules we discuss here surely are about a normal OLTP database.

Bye, Olaf.
 
In relation to the plural naming of tables - I can't understand the logic.
A table name should reflect an object that this table 'describes'.
E.g. a table that defines vehicle attributes should (logically) be named singular: 'tblVehicle' or 'vehicle'
A table contains the definition (attributes) of an object called a 'vehicle'.
The fact that it CAN hold many records is regardless - it CAN hold no records, it CAN hold 1 single record.
The same applies to query / view names - they CAN retrieve zero, 1 or many records - absolutely no reason to pluralise their names.
Some say that queries must always be plural - to differentiate them from tables - why?
And, why does a plural name magically imply a query anyway? It doesn't!

Why does one need to know that a table is a table or a query is a query in code?
The context will always make it obvious, and, the originator has no need to differentiate - he / she is envisaging the 'solution'.
I've yet to see one example where differentiating queries and tables is ever useful.

This will seem pedantic to many, but I like to apply logic to naming conventions.

I have seen naming conventions where prefix lettering is used, where some contain 1 letter, some contain 3.
e.g. integer i, string s, long lng

This, in my opinion adds complexity where it can be eradicated. It adds a decision process, where none should be.
I have now to 'think' and 'remember' exactly which object types require 3 characters and which require 1 or 2.
This convention has to be 'taught' and 'memorised' by all newcomers.
I have in essence to have a 'lookup table' in my mind.
If ALL were 3 characters - that 'memory' will never be required - all I have to 'learn' is: "ALL PREFIXES ARE £ CHARACTERS ALWAYS", and all I have to 'decide' is: which 3 characters logically abbreviate this particular object - it's easy! e.g. integer int, string str, long lng.

The saving of 2 characters to type is pretty much pointless.
If the maximum object prefix length is 3 - make them ALL 3, if it's 4 - make them ALL 4.

Something that (I agree - illogically) bugs me: 'ID' for a primary key.
A primary key is a primary key. If, on defining relational theory, the 'primary key' would have been better described as 'identifier' - 'identifier' WOULD have been used.
It wasn't; 'primary key' was chosen.
And, the ONLY logical, simple (without memory) acronym that can be used to denote a primary key is 'PK'.
And, remarkably: even 'foreign key' magically has the same trait - 'FK' denotes it very well (and I've no idea what is used to denote a foreign key when ID is used for primary key).

(And yes, I always prefix _FK with the table name that this key relates-to).

;-)

D


 
Queries? Did you mean views?

Well, I disagree your table name should be singular, because you address a singular object with its attributes. You can name your customer object singular, that is a singular thing, but data is not, rarely tables only have 1 or even no records. That's my point of view. The most important aspect is to not mix singular and plural.

Well, there is a fine distinction for me about ID and primary key, as the field itself doesn't define the PK, that's not a OK without the constraint and or unique index (different databases have different ways to define and check and guarantee the uniqueness). And you answered your rhetorical question yourself, why FKs are not only named FKs or ID, you want to know from which table they come from, even if a table only has one FK and it's clear as mud from which table it has to come.

This rule also is about generic code capable to generate queries, verify queries and overall naming conventions make code better verifiable, you can compute a code metric and quantify code quality in that aspect. Obviously you can't judge the elegance and effectiveness and such qualities, but code metrics just starts at this topic.

I tend to not denote exact types in names, not just because it's only a problem with dynamically typed languages, there are lots of intelligently parsing editors and IDEs knowing what to refactor how, even if you have similarly named things like variables and fields, they not just make a string pattern replacement. And on that topic you can also easily let an editor format CamelCase names by the feature of naming everything the same as the first occurrence, there's no magic and thinking about such stuff.

Bye, Olaf.
 
It doesn't matter one jot whether that table name is singular or plural. What matters is consistency. Decide whether it should be singular or plural, then stick to that throughout the database.

This isn't simply the hobgoblin of little minds. It is a sensible technique for reducing the strain on the developer's memory and the therefore likelihood of errors.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Well, as I said twice:

myself said:
In the end it's important you choose the same for all tables, so you don't have to think which has s and which doesn't.
myself said:
The most important aspect is to not mix singular and plural.

In the end I still have a preference for plural, as you can also interpret it as genitive in context of queries, eg [tt]WHERE customers.name = 'Lewis'[/tt], though it's clearly not a natural language sentence reads more natural to me than [tt]WHERE customer.name = 'Lewis'[/tt], if you read it as Lewis being my customer's name - or customers' name - or is it really just customers name?

Anyway, as I also already said:
myself said:
pluralizing the entitiy for the table name and talking about attributes of an entity in the genitive

I think the argument of singular for addressing a singular row is less valid than for ORM class/object names, where you really load the data of one entity into its type or class including its behaviour/methods/code.

That said, if I come into a project already having a database established, I wouldn't change from singular to plural names, I would even not consider changing naming, if there is a mix, if that's too much effort and too error-prone to change in all the projects sources. Especially if the database is already established and installed many times and you have riscs of database updates failing just for a little convenience. It rather makes me wonder why everybody seems to be accepting getting intellisense for your application frontend and middle tier code, but not for your database objects, only if you have a model/mapping of database to your oop classes/objects world.

Bye, Olaf.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top