Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delimited fields or unused fields? Design issue 3

Status
Not open for further replies.

Leozack

MIS
Oct 25, 2002
867
GB
Hi - I have a table which has, for instance, 5 fields and maybe people only need the first one to be filled in often. So 4 might well be empty. Is it better to join them into 1 delimited field or leave them as their own fields?
And - if I have some fields that are only relevant to some types of rows, is it better to have 1 field for them delimited, or have them as separate fields? (the same issue there really)
The other side of it is - is it better to split such delimited fields in the mysql call, or split them afterwards in php?
I might know which is easier though I can do it either way now I've researched, but I don't know which is better. I also don't know how big this db will get or how often these fields won't be used so it comes down more to the overall design issue of which is better. I'm thinking efficiency, speed, db load, and so on.
Thanks for all thoughts

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Also I could put the fields only used for a certain rowtype into a seperate table and if a row is of that type grab the info from there aswell. But only if that was the best idea, else I guess it would just make it a more complicated query to execute.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Delimited fields are a bad idea unless you really need them. They can't be indexed usefully, they're very hard to process, and most of all they're an unnatural way to store data in a relational database. Keep it simple and natural, even if it results in many more fields in your tables. That way normally leads to better performance, because you're doing things the way the system expects, and there are more possibilities for tuning.

Your idea for using a separate table for data relating only to certain rows is a good one. If only a few rows have that data, then you could save a lot of space, and improve speed, that way. However, if most rows have the data, performance might be worsened instead. The decision is a matter of balancing speed, disc space, and design simplicity.
 
Thanks - hmm yeah I figured if few people use optional fields (eg 2nd phone number 2nd email or whatever) then they should be combined wiht the first and delimited which I've foudn how to easily split them up and return them as part of the main sql call without too much code at all - eg for just 2 :
Code:
RIGHT(ColumnName, LOCATE('DelimiterString',ColumnName)+1) AS RightSide
LEFT(ColumnName, LOCATE('DelimiterString',ColumnName)-1) AS LeftSide
But if you think the database would rather I used seperate columns overall, even if many are empty in the db, then at least that makes the coding far easier. I just didnt' want to take the easy option if it wasn't the ebst option.
I don't know whether the fields will be hardly used or mostly used, as I'm not sure whether it will be a commonly used datatype of row or not. I guess I won't know til the site is running, annoyingly >< I guess I can just launch wiht the easy code way, and if I find few rows use that type, I can split them off into their own table. He says, sounding like a pro.
That sound like a plan?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
if you want to record zero or one or several phone numbers, have a separate table for phone numbers

that way each person can have zero or one or several rows in this table

i don't believe tony was suggesting multiple columns for this

i've seen it both ways (a separate phone table, versus phone1, phone2, phone3 ... columns in the same table)

however, the multiple columns in same table design is almost always clumsier and slower

the coding for the separate table is sleeker, and the performance is the best

r937.com | rudy.ca
 
Really? Now that's thrown what I was about to do today o_O Do you have an example of sleek/fast code versus slow/clumsy code by any chance?
Because from what I can see, the least code and easiest work is done by just having the fields in the same db, without having to go off to another table for every row that means a condition in the middle of a query

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
okay, i can show you the difference easily enough, but you will have to do the work (that way the pain will be more obvious)

let's talk about phone numbers, and assume that they are standardized 10-character fields (let's not get involved in formatting issues)

okay, let's consider a table with three columns -- phone1, phone2, phone3

please write the sql to find which person has phone number '4165551212'

now compare this to a separate table with a foreign key for the personid, and a single column phone

write the sql to find which person has phone number '4165551212'

after this, we'll get you to look at the sql to delete a phone number :)

r937.com | rudy.ca
 
Ok before I jump those hoops, let me make my example more real, as personally it has nothing to do with phone numbers. My table contains lists of 'adverts' if you wil, submitted by people. Some of those adverts are a certain type. For those types only, I have 2 or more fields that I want to either include, or put in a seperate table and fetch mid-query if a row is of that sort.
Along with that I have 5 fields for 'name', as this isn't their name but contact names. Now - until I launch, I don't know how many people will have adverts of the type with more data, and I don't know how many will have 1 name as oppose to all 5. But I don't want to work on code to lookup phone numbers from a seperate table unless that was actually what I was going to do - as it would probably involve joins and things I've not used.
But from what you've just asked me to write, I'd already say the pain lies in working out the code to use the seperate table by a long shot, as I've not had to do that yet, and don't intend to unless it is indeed the best option for me.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
If the attribute of an object can have a variable number of values, such as the number of phone numbers a person has or the number of available sizes for a particular model of t-shirt, then those values are best stored in a separate, related table. And "best stored" relates to the practical problems with storing, retrieving, searching, and deleting those values as well as the efficiencies of not storing redundant data or wasting unused storage space for data that's not always there.

There's also the fact that the "R" in "RDBMS" stands for "Relational". If you're not using joins, you're either doing trivial things with the datbase or are not doing things as efficiently as possible.

MySQL publishes an article on database normalization (which is what we're talking about here):

Want the best answers? Ask the best questions!

TANSTAAFL!!
 
But I don't want to work on code to lookup phone numbers from a seperate table unless that was actually what I was going to do - as it would probably involve joins and things I've not used.
i figured that phone numbers would at least be more familiar than foo and bar, and i can assure you that the structurs would be the same


and yes, the queries involve joins :)


r937.com | rudy.ca
 
I've just found one of the most useful articles (and sites) I've come across in my research, at
ttp://www.keithjbrown.co.uk/vworks/mysql/mysql_p5.shtml
I shall perform experimental joining code and such later or perhaps tomorrow. I shall also check your link on normalisation. Then we'll see where it leads.
Thanks as always people

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Having read guide on db normalisation, it seems the general gist of normalising is to take duplicate information out of a table and store it somehere else just once and reference it from the main table/other tables.

However, what I'm doing with a list of 'adverts' for instance doesn't match up with any such example dbs. Each entry (row) could have all the user defined fields unique and sharing no info with any other row. The few fields they use a seectbox for woudl obviously have lots that were the same (but making a seperate table for the few entries [and a seperate table for those in each select box] and doing joins and stuff is a lot of work for what can otherwise just store the selectbox content in the field as it does now and be fine, if "wasteful" of space).

So anyway ignoring the pulldowns, every other row could be different. The 5 name fields can't really go in their own table because there would be just as many rows and fields as there would be in the main table, even if half of the 2-5 one weren't used (doesn't matter which table they're in then it's wasted space).

As for the fields that only relate to a certain type of row, which may be more often than not or may be less often than not, I could make another table to contain some fields relating just to those rows - but I think my queries would end up doing the work to return all rows info and pulling the extra info (table joining) for those rows would end up no better than having those fields in every row on the main table even if more are empty than not (as I say I can't predict really I'm just guessing they might)

Hmmm, headache ><; But I'm getting definate feelings that splitting off data to seperate tables isn't the way to go, else, as with most of the normalisation examples, everything wants it's own table and a linking table and good grief, what a lot more code and a lot more tables - and in my case every main row could be unique anyway.

Sure it would mean the fields that are specifically only for certain rows (depending on the string contents of one of the fields in the row) were in a seperate table so rows not of that type didn't have empty fields (though as I've said I'm nto sure if this will be the majority or minority) but it seems like lots more work to impliment it. And it still doesn't apply to the fields that are relevant to all rows, but might just be empty more often than not though once again I'm just guessing (names 2-5).

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Also, in the interests of db design, the information in these tables once entered will probably not be udpated, or removed, so if that is a factor, there is the probable influence.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
leozack, you have my blessing -- go ahead and put the repeating fields side by side in the same table

in fact, don't be afraid to use comma-delimited lists of values in a single column, either

i have faith that you will figure out a way to manage your data with code

good luck to you in your endeavours

r937.com | rudy.ca
 
Leozack:
Why have you asked for advice and then argued with the advice? It seems to me that if you knew all along what you wanted to do, why didn't you just go ahead and do it?

My last piece of advice to you on this subject is: Don't let the number of tables of the complexity of their relationships freak you out.


Want the best answers? Ask the best questions!

TANSTAAFL!!
 
Leozack,

Before you go off and do something stupid that you're going to regret in six weeks when you finally understand the implications of your design shortcuts and are jumping through hoops to add features that should be simple, take a look at the names of the three people that responded to you, and then look at the top three MVPs for this forum in the box on the right hand side of this page. Then just for fun, look at the list of MVPs in the other SQL and database forums.

If you think you know better from reading a couple of articles than the 20 or so years of combined experience in designing and maintaining databases and developing database applications that these experts do, then you go right ahead and do what you think is best.

But, if you already knew best, why did you bother asking here in the first place?

Just my two cents.
 
I don't need to look around MvP lists I already know people on here well enough thanks. I also haven't argued AT their advice, nor known what I wanted to do. I posted on here because I wanted to know what the best way of doing it was in MY situation. Not in a situation where the db had a lot of fixed information, where the db's information was to be udpated or changed, or whatever. I also have fully working code running it as it is.

I psoted because I had 2 issues - 1 - that some of my fields might not be used often, and I wondered if it would be a good idea to combine them into 1 field - 2 - that I have 2 or more columns that are only used on 1 of 4 types of row that could be entered and wonderd how that was best handled too.

From what I have learnt SINCE posting, it seems my empty columns of varchar20's could be turned into tinyint's that point to foreign keys in seperate tables containing the data, or that this could be done for the columns where I'm using a pulldown to have all it's options in a seperate table.

But this woudl mean taking my code that works and straining to make big queries that basically just join it all back together again. This SEEMED to me a abd idea. So I posted here hoping for people to either say no see it's not so hard because you do it like this, or it's not the best idea as if you're not writing a 1 million row table the space difference is only a few k perhaps.

So why I get up this afternoon to find 3 tongue in cheek posts 1 of which from someone I'm already familiar with seeing holding answers on other forums I've posted on far away from lands of databases, I don't know. I haven't stated at any point I KNOW what I'm going to do. I've just been expanding more each time on what I HAVE done, and the factors on each side of the arguemnt which caused me to post here in the first place.

If noone wants to hear all that, fine, don't ask. As for not getting freaked out by splitting data up around tables - what do you expect when I've not split data up at all that's needed in the same query yet, and now I'm being told oh you want like 5 tables worth of stuff for the ultimate in normalisation and the queries are easy. Uhm ...

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
My post wasn't tongue-in-cheek; that implies I was trying to be funny, and I wasn't. So the following response is meant to be helpful, not to attack or berate. I don't give a rat's ass how you create your application, and I don't need to prove anything by being right. We all participate in these forums to help others and to give back a little to the community by sharing our expertise.

The MySQL RDBMS is a powerful tool that many programmers have spent many thousands of hours optimizing to efficiently join tables. This was done so that database designers could model their data correctly and only have to denormalize their design in the most extreme of circumstances. If the best design was to simply put all your data in one table since it's just going to "join it all back together again", then why would anyone bother doing it any other way? And why would anyone care about a product that allowed you to model your data into normalized fashion.

What I'm hearing from the last paragraph of your last post is that because you've never done it right before, you're afraid of doing it right now. Joins are the soul of a SQL database design, using them will allow you flexibility, power and ease of development beyond your wildest expectations. Embrace them, a modest investment in time learning joins will repay itself a thousand times over.

With that said, if you have trouble down the line with implementing a normalized design, you can be your bottom dollar that we'll be here and happy to help you through whatever problems you may have.

That's about all I have to say on the matter, so I'm out of this thread.
 
Shame you left, I would've pointed out that I didn't choose mysql because it was refined to the tune of an 8th octet b# for normalised data, or that it was powerfully relational -I'd say I chose it because it's the only db most people ahve access to for free in web design and works well with PHP. It makes a change to using flatfiles which I did in my last project. I therefore, at no point, set out with the idea of lots of different tables all holding a bit of data, or with the idea of joining tables. That's all stuff I've only picked up recently when contemplating my 2 issues. Until then my seperate tables are just for seperate queries.

Here I am staring once again at the dummy data in my table, and seeing what is duplicate information. Really the only duplicate information comes from the selectboxes I have as it can only be 1 of a few things. It is one of these columns that dictates whether I need a further few columns of data or not.

So - I think to myself, despite what's been said - I could make a seperate table for those pulldowns, and try to join them into every query, for the sake of having a bolumn of tinyints for each one instead of varchars. Yes - say you helpful people - that is the plan.

I then look at my name 2-5 columns and say once again - I fail to see how it can help to stick them in another table, which would require the same number of rows, and still be as empty/full as those columns would be in the main table. None of you has actually said if it is still helpful somehow to move them out - from what I've just said, surely it serves no purpose?

And as for the extra columns that only certain rows need - It took me enough fiddling to work out a query say contains
Code:
IF(column='special','',DoSomeStuffToTheExtracoumnAndReturnIt)
So what would I need to do to get the extra columns returned for those rows only if they're in a seperate table? I haven't seen such an example.

So once again I say - should I be doing this - or, CONSIDERING my circumstances and not just lumping me with "normalisation is what mysql is for", should I not? I am not for/against or I would've have ever posted here. I dno't mind splitting off tha data IF it is beneficial. And I don't mean "power/flexibility/etc" down the line because I'm probably not changing this once it's done nor is it a corporate massive db or anything intensive. Once again I say this for readers information not to argue against readers say -_-

I am fully open to help on my points saying why I should/shouldn't do whatever it is and why it helps/doesn't help. I am learning as I go here not preaching nor following a textbook example.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top