Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delimited fields or unused fields? Design issue 3

Status
Not open for further replies.

Leozack

MIS
Oct 25, 2002
867
GB
Hi - I have a table which has, for instance, 5 fields and maybe people only need the first one to be filled in often. So 4 might well be empty. Is it better to join them into 1 delimited field or leave them as their own fields?
And - if I have some fields that are only relevant to some types of rows, is it better to have 1 field for them delimited, or have them as separate fields? (the same issue there really)
The other side of it is - is it better to split such delimited fields in the mysql call, or split them afterwards in php?
I might know which is easier though I can do it either way now I've researched, but I don't know which is better. I also don't know how big this db will get or how often these fields won't be used so it comes down more to the overall design issue of which is better. I'm thinking efficiency, speed, db load, and so on.
Thanks for all thoughts

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
I've already written the code to grab all the tables together joined. What was once
Code:
SELECT * FROM adverts
is now
Code:
SELECT * FROM adverts
LEFT JOIN id_globalservice ON adverts.globalservice_id = id_globalservice.globalservice_id
LEFT JOIN id_deal ON adverts.deal_id = id_deal.deal_id
LEFT JOIN id_category ON adverts.category_id = id_category.category_id
And if I added al the fields instead of * then it would be even longer. Actually conditioning which rows are returned will also be longer as it will require tablename.columnname rather tahn just columnname, and so on. But if that's what you do, then that's how I'm now doing it. It's just not simpler than 1 table and a select *. Even though those are primative and bloated, they're simpler. That's all I mean ;)

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
let's see your SQL for finding which advert has one contactgame of 'foo' and another of 'bar'

okay, maybe you don't have those exact values, so pretend

i'm trying hard to exaplin the concept to you in terms that you can relate to -- your own table

find the rows where 2 of your 5 fields have those given values

i think you will discover that it can't be done without a table scan and a lot of ugly, ugly code

with a normalized design, i can do it with efficient (indexed) SQL and zero code




r937.com | rudy.ca
 
Off the top of my head I think the basic code would have been
Code:
select * from adverts where ((name1 like %bar%) and (name2 like %bar%))
but would now be
Code:
select * from adverts
LEFT JOIN id_globalservice ON adverts.globalservice_id = id_globalservice.globalservice_id
LEFT JOIN id_deal ON adverts.deal_id = id_deal.deal_id
LEFT JOIN id_category ON adverts.category_id = id_category.category_id
LEFT JOIN id_nametable ON adverts.ametable_id = id_nametable.nametable_id
where ((id_nametable.name1 like %bar%) and (id_nametable.name2 like %bar%))

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Oh - if you didn't know which field might be which you'd have :
Code:
select * from adverts where (((name1 like %bar%) or (name2 like %bar%) or (name3 like %bar%) or (name4 like %bar%)or (name5 like %bar%))
and ((name1 like %foo%) or (name2 like %foo%) or (name3 like %foo%) or (name4 like %foo%)or (name5 like %foo%)))
and
Code:
select * from adverts
LEFT JOIN id_globalservice ON adverts.globalservice_id = id_globalservice.globalservice_id
LEFT JOIN id_deal ON adverts.deal_id = id_deal.deal_id
LEFT JOIN id_category ON adverts.category_id = id_category.category_id
LEFT JOIN id_nametable ON adverts.ametable_id = id_nametable.nametable_id
where (((name1 like %bar%) or (name2 like %bar%) or (name3 like %bar%) or (name4 like %bar%)or (name5 like %bar%))
and ((name1 like %foo%) or (name2 like %foo%) or (name3 like %foo%) or (name4 like %foo%)or (name5 like %foo%)))
But now I look at that I figure if the id_nametable wasn't rows of 4 but lots of rows of 1 then I'd have to spend some more time working out what to do with it as I've not got it like that yet.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Wow thinking about the name possibility (I'm glad the code builds the query not me) I have 5 boxes to put in things to search for with an and/or matchall/matchany search. So it could get a little complicated and infact would end up checking all 5 possibilities for all 5 fields. But then, unless they're all in 1 field that can be checked and/or for all 5 thigns input, then it's going to be that way, 5*5 checks.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Well, if you want to do a full dump of all the data in some sort of flat format, then sure it'll be more complicated to join in all your data, but how often do you really need all that data in a single query?

Let's look at a simple schema like this:

Person
------
person_id
name

Phone
------
phone_id
person_id
number
type

What are some real world scenarios of queries you may need? How about: What's Bob's cell phone number? With the normalized form you can associate more information with the phone number than simply the number, like a phone type.
Code:
select name, number
from   person
natural join phone
where  person.name = 'Bob'
and    phone.type = 'Cell'

Or if you needed a list of everyone's office number:
Code:
select name, number
from   person
natural join phone
where  phone.type = 'Office'

Or if you want a list of everyone that has more than one phone number (I don't know why, but let's just say):
Code:
select name
from   person
natural join phone
group by name
having count(number) > 1

How about a list of how many phone numbers each person has:
Code:
select name, count(0)
from   person
natural join phone
group by name

How about finding everyone that lives together, you would know this because they have the same home phone. (I'm going to have to revert to old notation, I'm still getting used to the "left join" syntax myself):
Code:
select p1.name, p2.name
from   person p1, person p2,
       phone  h1, phone h2
where  p1.person_id = h1.person_id
and    p2.person_id = h2.person_id
and    h1.type = 'Home'
and    h2.type = 'Home'
and    h1.number = h2.number
and    p1.person_id != p2.person_id

The list goes on and on.
 
No - but from what I have found just now, that means it returns all the rows (to itself) rather than individual rows. But then, how the db works is hardly going to be in my domain currently, so I wouldn't know. I saw at least 1 place saying just having a LIKE %this will cause a table scan due to the widcard 1st char. Inwihchcase it can't be helped.
Thanks for the real examples Eric, that's what makes the world go round. I also looked up your natural join, and that saves an extra line of code for every seperated table to join it in with the forign key.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Actually it seems if oyu natural join more than 1 table, the 2nd and upwards natural joins return a match for each subtable row, aka a row for each entry in the subtable, aka, lots and lots of incorrect rows. Doh, so only 1 natural join a time, then it needs the left joins instead. Oh well. So - what about table scans?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
what about table scans?

answer: you only want to do them if the number of rows in your tables is fewer than a few hundred

any more, and the query starts to slow down

thousands of rows, might as well go for lunch

hundreds of thousands of rows, launch the query, go home, come back tomorrow...

:)

r937.com | rudy.ca
 
Well the master table will get more than a few hundred, in theory, as will the names table as that will have upto 4 per row in the main table (or maybe none on more/less rows I can't know beforehand). The rest are all fixed tables of just a few rows.

So - does that mean I will want to avoid table scanning once things pick up? And if so - how are you suggesting I do it with this new multiple table design with a bulking pile of code for every query which I still haven't even worked out for half of my stuff (seems natural joins and using(columnnames) works only for joining the first table for example then it's back to left joins).

I can't not have a % wildcard before a text search, else it would only pickup text at the beginning of a field. Any other optimisations I don't even know. Stuff with keys and indexes I dont' understand at all, as far as I'm aware you have your primary auto incrementing id key on a table and that's about it.

As a sidenote I noticed things like "group by name having count(number) > 1" and figure group by merges all rows with the same name (in this case) column together into 1 row, or adds them, or something.

Wow - I really wish I'd just made this thing with a flat table and got it going ebfore I started explodinge verything and multiplying the code and getting into the depths. Plus when you come abck to it you've no idea what it's doing adn haev to refresh it all in your head. But then, if it's explainable and learnable, I don't mind doing it the "right" way as you'd say, and ignoring the 'easy' route ;)

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
my advice: don't ever use NATURAL or USING in your joins, code the ON conditions explicitly

also, in this example, you don't want LEFT OUTER JOINs from your main data table to the "lookup" tables (categories, deals, globalservices) because there's no way the main table should have a linking id that doesn't exist in the lookup table -- so use INNER JOIN

for indexing, you would want to learn the basics before going much further, so check out these tutorials --

* Understanding SQL Joins
* Optimizing your MySQL Application


r937.com | rudy.ca
 
Just for the record, as I re-read everyone's comments agian, my 'real world' queries will infact need ALL the data for an object, nearly every single time. Infact the only data they won't need to show from initial search results is kept in a seperate table already, and is 'relpies' as such. So yes, all the data is needed for every result basically. Not "what is someones phone number" but "what is someone's full vcard details" etc.

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Ok I've read through the indexing stuff and as usual, it's a for/against arguement for indexes of diskspace and readtime/writetime etc.
An obvious column to index will be the parent_id columns of my 'extra' information tables so that the rows with parent_id's matching the main table can quickly be pulled up.

For searching, the price will be a common one so that can easily be indexed as it's numerical too. The onyl other one I could consider indexing, for now anyway, would be itemname. But as they currently look for "LIKE %this%" it will do a table scan anyway. Plus with all the otehr text fields and numerical fields, it is likely someone searches for something that isn't indexed alongside something that is and thus it table scans anyway, right?

Oh dear -_-

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Infact even with price indexed runnign this search
Code:
explain SELECT adverts.*, globalservice, deal, category FROM adverts
LEFT JOIN id_globalservice using(globalservice_id)
LEFT JOIN id_deal ON adverts.deal_id = id_deal.deal_id
LEFT JOIN id_category ON adverts.category_id = id_category.category_id
where price > 1000;
get this result
Code:
+------------------+--------+---------------+---------+---------+----------------------------------+------+-------------+
| table            | type   | possible_keys | key     | key_len | ref                              | rows | Extra       |
+------------------+--------+---------------+---------+---------+----------------------------------+------+-------------+
| adverts          | ALL    | price         | NULL    |    NULL | NULL                             |   41 | Using where |
| id_globalservice | eq_ref | PRIMARY       | PRIMARY |       1 | romerch.adverts.globalservice_id |    1 |             |
| id_deal          | eq_ref | PRIMARY       | PRIMARY |       1 | adverts.deal_id                  |    1 |             |
| id_category      | eq_ref | PRIMARY       | PRIMARY |       1 | adverts.category_id              |    1 |             |
+------------------+--------+---------------+---------+---------+----------------------------------+------+-------------+

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Another sidenote - I see if I use LEFT joins the results are ordered by id of the main table but if I use INNER joins, they end up listed by the 2nd of 3 joined lookup table's returned column. Odd. Though I order them manually anyway but still, odd?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
to order the results, the query requires an ORDER BY clause

and as i said before, in this instance you do not want LEFT OUTER joins

r937.com | rudy.ca
 
Like I said I have an order by by default, I just noticed that behaviour of left and inner joins without an order by.
I am using inner joins.
I have indexed price in the main table, and used it as a where clause, yet still it does a table scan instead. Though my table is only 50 rows so that might be why?
I'm really finding it hard to figure out what to index when they can search for any of the fields at once.
I guess I should index the names column of the names table too. Plus it's parent_id column as the names for each row will be wanted more than the name list searched I guess. Infact maybe there's no need to index the name column anyway?
There is the fulltext index which perhaps I should be using for the text fields so I can do fulltext searches instead of like %this% which creates a table scan?

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
It does? But what fields should I put fulltext indexes on? All text fields are on the search form for searching. People could search for any combination of them, along with any numeric fields - they're not normal indexed either.

As for the lookup tables, I've not indexed them (besides the primary key).

For my extra information tables, the same applies to them as the main table - except they are also being accessed 1 row at a time to link upto the parent row of the main table so I indexed the parent_id field. But the same searching things apply as the info held in the main table.

Argh. Assuming I know what fields to index, I think the fulltext search is simple enuogh - replace WHERE this LIKE %that% with WHERE match(this) against('that')

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top