Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delimited fields or unused fields? Design issue 3

Status
Not open for further replies.

Leozack

MIS
Oct 25, 2002
867
GB
Hi - I have a table which has, for instance, 5 fields and maybe people only need the first one to be filled in often. So 4 might well be empty. Is it better to join them into 1 delimited field or leave them as their own fields?
And - if I have some fields that are only relevant to some types of rows, is it better to have 1 field for them delimited, or have them as separate fields? (the same issue there really)
The other side of it is - is it better to split such delimited fields in the mysql call, or split them afterwards in php?
I might know which is easier though I can do it either way now I've researched, but I don't know which is better. I also don't know how big this db will get or how often these fields won't be used so it comes down more to the overall design issue of which is better. I'm thinking efficiency, speed, db load, and so on.
Thanks for all thoughts

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
you don't mind splitting the data off if it's beneficial?

good, then do so, because it is

as you will see when you write the sql

:)

r937.com | rudy.ca
 
*blinks* Ok that's the blanket statement I've got so far which is what I don't like. I've read the normalising argument, I know why removing duplicate data for instance is a good thing (filesize wise) when you split it off and join it in.

But in 2 of my 3 possible split circumstances it was down to empty space not duplicate data. That empty space on 1 of those circumstances would still exist in the other table too, as each row would have to exist there for the first name even if the other 4 names were empty. Does that make sense?

And to be honest I have sql already written - as we speak I am trying (I am trying, not just complaining incase anyone thought I was sat about not trying what people say) to split off the pulldown lists to their own tables and reference them with id's for the "duplicate data" optimisation, and I haven't even got to writing the sql for that yet but it is going to be MORE than what I already have - so why does everyone keep saying writing MORE sql to join tables toegether and such is easier? Am I missing something really obvious here? Surely.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
if you have already written the sql, then you must already have created the tables

do you mind sharing the design of those tables as they stand now?

r937.com | rudy.ca
 
No the sql I'm using is based around the single table design (I have more than 1 table but not for the same query). All 'adverts' are in the same table. HAving said that, I've alreayd backed it all up and spent some times splitting off 2 columns of repeatable data (selectbox options) into seperate tables, and am about to try joining and such. Except I have to goto work tonight -_-
But just to give you an idea here is my initial search finding query :

Code:
$sqlcount = "SELECT id";
$sqlcount .= ",created,deal,hours,";
$sqlcount .= " IF (strcmp(deal,'Auctioning'),'',UNIX_TIMESTAMP(created)+(hours*3600)) AS 'tend'";
$sqlcount .= " FROM ";
$sql = "adverts WHERE (";

// <snip>  above is sqlcount to count the rows, snipped out is sql which is the bulk of the query, below is stuff from sqlend regarding other conditions/limits etc

if ($timespan != "all") { $sqlend .= " AND (TO_DAYS(NOW()+0)-TO_DAYS(updated))<".($timespan*31); }
if ($includeclosed != "yes") { $sqlend .= " AND closed = ''"; } else { $sqlend .= " AND ((right(closed,5)!='Admin') OR (closed=''))"; }

adverts is the main table, created is a timestamp of the row creation, deal is the column that effects whether we need the extra columns I was asking about (which, if set to auctioning, uses the hours column to find out and return when it is set to end).

This is ignoring the parts in <snip> (main sql bit) which is a couple of pages of query building up, eg :

Code:
if (VarSet($scomments)) {
	if ($scommentsc == "phrase") { $sc++; $sql .= AndOr($sc,$search); $sql.= "comments LIKE '%".mysql_escape_string($scomments)."%'"; }
	else {
		$starting = true;
		$sc++; $sql .= AndOr($sc,$search)."(";
		$temp = split(" ",$scomments);
		foreach($temp as $value) {
			if (substr($value,0,1)==",") { $value = substr($value,1); }
			if (substr($value,-1)==",") { $value = substr($value,0,-1); }
			if (!$starting) { if ($scommentsc == "all") { $sql .= " AND "; } else { $sql .= " OR "; } }
			$sql .= "comments LIKE '%".mysql_escape_string($value)."%'";
			$starting = false;
		}
		$sql .= ")";
	}
}

//and stuff like

if (VarSet($globalservice)) { $sc++; $sql .= AndOr($sc,$search); $sql.= "(globalservice LIKE '%".$globalservice."%')"; }

And then in when results are found they are returned for real with sqlstart code of :

Code:
$sqlstart = "SELECT *,
DATE_FORMAT(created,'%a %D %b %Y %H:%i') AS dcreated,
DATE_FORMAT(updated,'%a %D %b %Y %H:%i') AS dupdated,
DATE_FORMAT(LEFT(closed,14),'%a %D %b %Y %H:%i') AS dclosed
FROM ";
$sqlend .= " ORDER BY '".$sortby."' ".$ascdesc." LIMIT ".(($viewpage-1)*$perpage).", ".$perpage;

I don't think you care about any of that at all. but only now am I putting things like 'deal' and some other categories in their own tables with foriegn keys and am now going to try andmodify all my sql to use joins to call it. Baring in mind this is all in the name of optimisation right? It will save space, just not sql code (I'm going to be writing MORE to go wiht what I've got) and not speed - or is mysql faster at joining lots of tables than it is reading just 1?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
I know, but if you look at that php you'll see it's just the mysql queries being constructed, so it's more mysql really. The only php in there is $variables being inserted into the query. That and the 1 loop demonstrating more bits of query being added.

I would show you my tables but as at that point it was just 1 table there's no point. When I've finished breaking it all up I'll show you. Though by then I might aswell use it broken up even if it isn't helping as I would've already done the work.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Ok you wanted to see my table structures? They are as such :
My main table : (fields marked with _id are to match foriegn keys in my new seperate table - these come from select boxes so can only be a limited number of things, ok for a seperate table)
Code:
+------------------+----------------------+------+-----+----------------+----------------+
| Field            | Type                 | Null | Key | Default        | Extra          |
+------------------+----------------------+------+-----+----------------+----------------+
| id               | smallint(5) unsigned |      | PRI | NULL           | auto_increment |
| updated          | timestamp(14)        | YES  |     | NULL           |                |
| created          | timestamp(14)        | YES  |     | 00000000000000 |                |
| globalservice_id | tinyint(2) unsigned  |      |     | 0              |                |
| servername       | varchar(20)          |      |     |                |                |
| adminpw          | varchar(34)          |      |     |                |                |
| contactgame1     | varchar(30)          |      |     |                |                |
| contactgame2     | varchar(30)          |      |     |                |                |
| contactgame3     | varchar(30)          |      |     |                |                |
| contactgame4     | varchar(30)          |      |     |                |                |
| contactgame5     | varchar(30)          |      |     |                |                |
| contactemail     | varchar(50)          |      |     |                |                |
| deal_id          | tinyint(1) unsigned  |      |     | 0              |                |
| category_id      | tinyint(2) unsigned  |      |     | 0              |                |
| subcategory      | varchar(100)         |      |     |                |                |
| itemname         | varchar(50)          |      |     |                |                |
| price            | int(10) unsigned     |      |     | 0              |                |
| quantity         | smallint(5) unsigned |      |     | 0              |                |
| rjob             | varchar(50)          |      |     |                |                |
| rlevel           | tinyint(3) unsigned  |      |     | 0              |                |
| slots            | tinyint(3) unsigned  |      |     | 0              |                |
| cards            | varchar(100)         |      |     |                |                |
| effect           | varchar(250)         |      |     |                |                |
| comments         | varchar(250)         |      |     |                |                |
| hours            | smallint(5) unsigned |      |     | 0              |                |
| closed           | varchar(30)          |      |     |                |                |
+------------------+----------------------+------+-----+----------------+----------------+
Those seperate tables are :
Code:
+-------------+---------------------+------+-----+---------+-------+
| Field       | Type                | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| category_id | tinyint(2) unsigned |      | PRI | 0       |       |
| category    | varchar(20)         |      |     |         |       |
+-------------+---------------------+------+-----+---------+-------+
Code:
+---------+---------------------+------+-----+---------+-------+
| Field   | Type                | Null | Key | Default | Extra |
+---------+---------------------+------+-----+---------+-------+
| deal_id | tinyint(1) unsigned |      | PRI | 0       |       |
| deal    | varchar(20)         |      |     |         |       |
+---------+---------------------+------+-----+---------+-------+
Code:
+------------------+---------------------+------+-----+---------+-------+
| Field            | Type                | Null | Key | Default | Extra |
+------------------+---------------------+------+-----+---------+-------+
| globalservice_id | tinyint(2) unsigned |      | PRI | 0       |       |
| globalservice    | varchar(30)         |      |     |         |       |
+------------------+---------------------+------+-----+---------+-------+
Is there anyway when you say to select your rows of data and join the tables, to not get the useless id columns of the seperate tables returned? (other than specifying ALL the columns you want returned individually? like a way to say "* but not these")

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Is there anyway when you say to select your rows of data and join the tables, to not get the useless id columns of the seperate tables returned? (other than specifying ALL the columns you want returned individually? like a way to say "* but not these")
No you have to list your fields individually. That's just good coding practice, anyway, since most API's return the fields positionally, so inserting, moving or deleting a column will alter that ordering and break code. Most experienced DB developers would recommend against using "select *" in production code.
 
no, there is no way to say "*, but not these"

just list the columns you want

if i go back and look at your first few posts in this thread, and compare to your latest table design, i don't see where you're using "eg 2nd phone number 2nd email or whatever"

the only thing that sticks out are the 5 contactgameN columns

other than those, your tables look fine

r937.com | rudy.ca
 
Yes - I could lsit all the fields indiviidually, and maybe I will then (though all that saves is the id columns of the seperate tables, though as you say it might be useful for developers should this be production code). Though why you'd want your code to break if something changes I don't know - in theory it helps you debug but in practice often it just ives you something to have to debug. When you say "moving" columns - is there a way to move columns around the table? I've always made a new one, moved everything in, deleted the old one, renamed the new one. I also saw thatunless I specified in mysqlcc to order by id, when I open the tables by default they order the fields in the order I recated them. Which was backwards for my new seperate tables >>

As for the gamenameN columns - those are the column I was talking about when I referred to phone numbers. I didn't say they were phone numbers infact I said they weren't, it was an example. So my point remains - delimit them into 1 field? Keep tehm seperate? Stick them in another table which would need just as many cells as they'd take up in this table (so why?)?

Plus the extra columns that are only used for 1/4 types of 'deal' column are not listed here, as I haven't made them. I'm guessing I will make a seperate table for them.

In the end I'll have substancially more code to get working in all my pages, a table you can't flick open and look at in a viewer wihtout setting up long joining calls - but less space wastage (negligable really but hey) and allegedly better design/speed/optimisation/whatever. In theory?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Code:
alter table mytable modify field2 char(1) after field3

I believe it's non-standard MySQL extension to the language.
 
So my point remains - delimit them into 1 field? Keep tehm seperate? Stick them in another table which would need just as many cells as they'd take up in this table (so why?)?
none of the above ;-)


place them in another table like this:

create table main_contactgames
( main_id smallint unsigned not null
, contactgame varchar(30) not null
, foreign key ( main_id ) references maintable (id )
, primary key ( main_id, contactgame )
)

there will be fewer "cells" because you do not create rows for values that don't exist

so, in your main table, id=9 might have 4 rows in the main_contactgames table, and id=37 might have only 1

this allows you to expand effortlessly (and with no change to the code) beyond 5 of these things, should the need arise



r937.com | rudy.ca
 
I see so phone 2-4 would just go into a seperate table which used the main table's id and phone1 to reference it. Or something. But I can't help but feel it's not the best idea for this situation.

Let me explain more, see if you agree or if I'm wrong. The rows of the main table have the gamename1-5 columns. Every row could have 1-5 of those filled in (1 is mandatory - but until I launch I won't know if amybe everyone uses all 5 all the time). All 5 of them could be used in other rows aswell as this row, or could not, and some might be but some not, so every row containing the same #1 might not have the same #2-5, and so on. I won't need more than 5 either I don't think.

I've never made a table using the commands as you've given them so not familiar with your key references and 'references maintable' stuff. But from what I can see you're suggesting rather than have a seperate table of 4/5 names per row I split those rows into 1 column per row and just have multiple rows for every row in the main table. That would become a rather large table pretty quick full of just single cells with no obviuos attachments and lots of duplicate data (though not necessarily). Is that better than leaving them in the main table?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
you're suggesting rather than have a seperate table of 4/5 names per row I split those rows into 1 column per row and just have multiple rows for every row in the main table.
that is correct, and that is what we've been trying to tell you :)


this is called normalization, and splitting a repeating group into a separate table, with a single column and multiple rows, i.e. a one-to-many relationship, will put your design into first normal form

this is almost always much better than leaving the repeating group in the main table

r937.com | rudy.ca
 
Repeating? This is the bit I don't get - unlike my other seperate tables, there isn't necessarily anything repeating at all about these, just that some might be empty?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
they "repeat" because there are 5 of them in the table, not 1

do a search on normalization for more information

scenario 1 -- main table contains 5 fields
Code:
main table
id  name  fld1 fld2 fld3 fld4 fld5
101 fred   2    -    -    -    -
102 bill   4    2    1    -    -
103 doug   5    3    -    -    -
scenario 2 -- main table plus one-to-many table
Code:
main table
id  name
101 fred
102 bill
103 doug

one-to-many table
id  fld
101  2 
102  4
102  2
102  1
103  5 
103  3

r937.com | rudy.ca
 
I see what you mean. Not only does that not make sense from a human-readable point of view, but it also seems like the db will hve a lot more work to do, and the queries more complicated. Is this really better than the more human readable simpler queried approach that wastes space?

Else you could say every cell of every table could go in another table incase it's empty etc, and sometimes it feels like that.

I'm guessing you'll say yes, but hey :p

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
no, i'm not saying "every cell of every table could go in another table incase it's empty"

however, i do support first normal form, which you should really research a bit more

and i also have about 18 years of experience with the SQL language and ALL KINDS of table designs

the queries are not more complicated, they are simpler

:)

but hey, if you want to do it your way, do it your way

we've been round this now a few times, and i think the only thing that will convince you is for you to discover the pros and cons of the different designs yourself

as i said before, good luck to you

r937.com | rudy.ca
 
Thanks. I still haven't seen anyone say why it is simpler for me to do all this extra work splitting up a humanreadable table and simple queries into non humanreadable, more effecient table designs with longer more complicated queries.

I mean - you've said why I should do it, I've read up on normalising and everything - but I just can't accept someone saying it's simpler, or easier, when it's not. Yes, I agree, for a computer, it's better. But I can't accept this better design comes without a complication cost, when I've just paid that cost myself in having to do it, as oppose to having the 1 table and a select *.

But thankyou to those who helped. I'd dish out stars but I'm not sure who should have got them over who :) Maybe I'll just dish some out anyways.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
dude, like i said at the outset, you can't see why it's simpler because you haven't tried writing the SQL for it, which is what i was going to help you with

i can appreciate your frustration at what looks like something "less human-understandable" but it's not humans that have to manage the data, it's SQL

and if you don't want to do it with simple SQL, you will have to do it with more complex code

call me when you decide that you need something faster than a table scan

:)

r937.com | rudy.ca
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top