Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

5 years mysql experience but can't figure out good architecture...

Status
Not open for further replies.

Thanasus

Programmer
Nov 25, 2003
13
US
I am wracking my brain trying to figure out the best way to design the database. The database will store real estate data. My biggest concern is speed of searching.

Lets say I have a home which has 150 different attributes to it. Example attributes would be price, year built, square footage, number of stories, number of bedrooms, high school which home is zoned for, roofing description, or landscape description. I am making a separate table for all 150 attributes and one table to contain property numbers.

CREATE TABLE `property` (
`id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT ,
`timeadded` TIMESTAMP NOT NULL ,
`timeupdated` TIMESTAMP NOT NULL ,
PRIMARY KEY ( `id` )
);

CREATE TABLE `bedrooms` (
`fkey_prop` MEDIUMINT UNSIGNED NOT NULL ,
`bedrooms` TINYINT UNSIGNED NOT NULL ,
INDEX ( `fkey_prop` )
);

CREATE TABLE `bathrooms` (
`fkey_prop` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT ,
`bathrooms` FLOAT UNSIGNED NOT NULL ,
INDEX ( `fkey_prop` )
);

CREATE TABLE `landscaping` (
`fkey_prop` MEDIUMINT UNSIGNED NOT NULL ,
`code` TINYINT UNSIGNED NOT NULL ,
INDEX ( `fkey_prop` )
);

So the 3 attribute tables given as examples above (bedrooms, bathrooms, and landscaping) have one-to-one and many-to-one relationships with the property table. Bedrooms and bathrooms are a simple 1-1, a home will always have one set value for number of bedrooms - it might be 4 or 2 or 3 or whatever.

However, a home could have multiple landscaping entries. Now, you'll note that the landscaping table has a field called code. Code is a numeric code for a sting which is stored in another table called 'landscape_codes'. Perhaps landscape_codes contains the values "sprinklers, grass, desert landscaping, fruit trees, chain fence, brick wall, wood fence". A home might have sprinklers and be grassy with a wooden fence. Therefore it would have 3 landscaping codes.

I want the user to be able to search homes by criteria they specify and sort by criteria they specify. They may search for "all homes containing at least 3 bedrooms that have a brick wall or chain fence".

So, I am gong to have > comparisons, < comparisons, between comparisons, and &quot;in&quot; statements possibly.

Also every property search will also be able to specify a timestamp range for the home. &quot;Show me homes meeting my criteria which were added to the system between Nov 4 2003 and Nov 5 2003 AND that had a last update of Dec 28 2003&quot;

I am trying to not do filesorts when executing the SQL query but cannot figure out a way around it. Any ideas?
 
Ok, I have done some stress testing... I made two different setups:

Setup A:
Multiple tables. Each attriute datatype has a different table. I have some attributes that are in an integer table, some that are in a dimensions table, some that are in a float table.


CREATE TABLE prop (
id mediumint(8) unsigned NOT NULL auto_increment,
PRIMARY KEY (id),
UNIQUE KEY id (id),
);

CREATE TABLE prop_int (
id int(10) unsigned NOT NULL auto_increment,
fkey_prop mediumint(8) unsigned NOT NULL default '0',
fkey_attrib smallint(5) unsigned NOT NULL default '0',
value bigint(20) unsigned NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE KEY id (id),
KEY pav(fkey_prop,fkey_attrib,value),
KEY v(value)
);

I populated the tables with real data. Table &quot;prop&quot; has 135,327 rows of data and &quot;prop_int&quot; has 1,402,426 rows of data.

I simulated a search for a home between 750,000 and 780,000 in price which is located in one of several (89149,89129,89108,89128,89134,89144,89145,89107,89101,89104,89109,89121,89119,89120,89122,89142,89110,89156,89115,89030,89106,89032,89102,89103,89146) zip codes.

The query comes out:
SELECT p.id FROM prop AS p
LEFT JOIN prop_int AS t1 ON (t1.fkey_prop=p.id AND t1.fkey_woprattrib=1)
LEFT JOIN prop_int AS t56 ON (t56.fkey_prop=p.id AND t56.fkey_woprattrib=56)
WHERE (t1.value BETWEEN 750000 AND 780000)
AND t56.value IN (89149,89129,89108,89128,89134,89144,89145,89107,89101,89104,89109,89121,89119,89120,89122,89142,89110,89156,89115,89030,89106,89032,89102,89103,89146)
GROUP BY p.id
ORDER BY p.id ASC;



Setup B:

Put all the data in one massive varchar table...

CREATE TABLE prop (
id int(10) unsigned NOT NULL auto_increment,
fkey_prop mediumint(8) unsigned NOT NULL default '0',
fkey_attribute smallint(5) unsigned NOT NULL default '0',
value varchar(40) NOT NULL default '',
PRIMARY KEY (id),
KEY (fkey_attribute,value(6))
);

When populated, the table contains 6,950,111 rows. The reason there are many more rows in B than A, is because B also has all float data, dimension data, and string data.

Conducting the same query
select p.id
from prop_test AS p
where
(fkey_attribute = 1
AND (value BETWEEN 750000 AND 780000)
)
OR
(
fkey_attribute=56 AND
value IN (89149,89129,89108,89128,89134,89144,89145,89107,89101,89104,89109,89121,89119,89120,89122,89142,89110,89156,89115,89030,89106,89032,89102,89103,89146)
)
group by fkey_prop
having count(*) = 2
ORDER BY fkey_prop ASC;




The first query, multiple tables is consistenly faster. Query A takes 0.02 sec whereas query B consistently takes about 1.5 seconds.

Now I know that Query B involves a table that contains a lot more data, however, that is the entire reason why I wanted to break the data into multiple tables and involve joins.

An explain for Query A reveals:
+-------+--------+---------------+---------+---------+--------------+------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+---------+---------+--------------+------+----------------------------------------------+
| t1 | range | v,pav | v | 8 | NULL | 218 | Using where; Using temporary; Using filesort |
| p | eq_ref | PRIMARY,id | PRIMARY | 3 | t1.fkey_prop | 1 | Using index |
| t56 | ref | pav | pav | 5 | p.id,const | 1 | Using where; Using index |
+-------+--------+---------------+---------+---------+--------------+------+----------------------------------------------+


An explain for Query B reveals:
+-------+-------+-----------------+-----------------+---------+------+--------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+-------+-----------------+-----------------+---------+------+--------+----------------------------------------------+
| p | range | fkey_attribute | fkey_attribute | 2 | NULL | 258495 | Using where; Using temporary; Using filesort |
+-------+-------+-----------------+-----------------+---------+------+--------+----------------------------------------------+


Conclusion, multiple tables - NOT 150 but a handful to separate the data, with joins, seem to be much faster. Any ideas or feedback?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top