Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

select a single row based on conditions in multiple rows 1

Status
Not open for further replies.

SashaBuilder3

Programmer
Jan 13, 2002
131
CA
Hello dear SQL experts,

I am working on a project related to arthropod species identification.


Here is a simplified table where we list various arthropod species with their properties:
i_spIDt_propertyt_speciesi_property_val
1nmb_legsBlatta germanica6
1nmb_wings
Blatta germanica
4
2nmb_legsMusca domestica6
2nmb_wings
Musca domestica
2
3nmb_legsLatrodectus hesperus8

i_spID and t_property comprise the primary key.


Each species may have multiple properties (unlimited number) in the table. The ultimate goal is giving a set of properties to identify (ideally) a single species.
I have come up with a query having multiple subqueries, each for each pair of property/value:

SQL:
SELECT t1.i_spID,t1.t_species
FROM (SELECT * FROM DBO.t_species WHERE t_property='nmb_legs'  AND i_property_val=6) AS t1,
     (SELECT * FROM DBO.t_species WHERE t_property='nmb_wings' AND i_property_val=4) AS t2
WHERE t1.i_spID=t2.i_spID
(this one identifies the cocroach 'Blatta germanica')

The problem is that as the number of properties grows the query becomes too combersome especially the "where" part in which we need to connect each subquery with all the others through i_spID.

So is there a better way to write the query?


-- create and fill in table SQL:
SQL:
CREATE table DBO.t_species
 (
  i_spID                     int NOT NULL,
  t_property                 nvarchar(20) NOT NULL,
  t_species                  nvarchar(50),
  i_property_val             int
 );

ALTER TABLE DBO.t_species
ADD CONSTRAINT pkID_sp01 PRIMARY KEY (i_spID,t_property);



INSERT INTO DBO.t_species (i_spID, t_property, t_species, i_property_val)
VALUES (1, 'nmb_legs', 'Blatta germanica', 6),
       (2, 'nmb_legs', 'Musca domestica', 6),
       (3, 'nmb_legs', 'Latrodectus hesperus', 8),
       (1, 'nmb_wings', 'Blatta germanica', 4),
       (2, 'nmb_wings', 'Musca domestica', 2)
;
 
I see the pattern, it's simiar to searching a recipe/formula by ingredients and their amount.

The step you can make easier than specifing the subqueries is first defining a data structure for a search, that always consists of pairs of properties and their values, for a generalisation you could even think of a range of values for an "unsharp" search.

That's a first step in simplifying the search to a data specification. Then comes a part of maybe generating a query for such search data, or processing multiple searches each with one property only, shrinking down the candidates with each further property/value pair.
 
What you have here is considered an Entity-Attribute-Value table. https://en.wikipedia.org/wiki/Entity–attribute–value_model

This type of table structure is fairly decent when storing "sparse" data, but doesn't do so well otherwise. The problem with this type of data structure is that it is painful to query, especially as the complexity grows. If you are stuck using this table structure, I would encourage you to research the EAV modal, and the various ways to query it.

You'll probably notice that the EAV modal realy only has 3 columns, while yours has 4. This is because you should not be storing Species Name in this table. Instead, you should have another table for it. Something like... DBO.t_Species_Name (i_spID int, t_species navchar(50)). Then, if you need to get the species name, simply join to this table. The benefit of using a table like this is that it will save space in the DB and actually make your queries faster (because you will fit more rows into page).

So, each species can only have distinct properties. For example, a species can only have one row for number of legs, number of wings, etc.... As such, you could use something like this.

Code:
select    i_spID,
        max(case When t_property = 'nmb_legs' Then i_property_val End) As nmb_legs,
        max(case When t_property = 'nmb_wings' Then i_property_val End) As nmb_wings
From    t_species
group by i_spID

The code shown above is a good method for converting rows into columns. There are various ways to do this. This is my preferred method.

If you want an exact match for the query you provided, you could use this:

Code:
select    i_spID, 
        t_species,
From    t_species
group by i_spID, t_species
Having max(case When t_property = 'nmb_legs' Then i_property_val End) = 6
        And max(case When t_property = 'nmb_wings' Then i_property_val End) = 4

Note that this is a grouping query, also known as an aggregate query (google those terms to learn more about them). With an aggregate query, where clauses cause problems because they typically eliminate too much data. Instead, you can use the having clause to effectively filter the data instead.
 
What you have here is considered an Entity-Attribute-Value table. https://en.wikipedia.org/wiki/Entity–attribute–value_model

This type of table structure is fairly decent when storing "sparse" data, but doesn't do so well otherwise. The problem with this type of data structure is that it is painful to query, especially as the complexity grows. If you are stuck using this table structure, I would encourage you to research the EAV modal, and the various ways to query it.

You'll probably notice that the EAV modal realy only has 3 columns, while yours has 4. This is because you should not be storing Species Name in this table. Instead, you should have another table for it. Something like... DBO.t_Species_Name (i_spID int, t_species navchar(50)). Then, if you need to get the species name, simply join to this table. The benefit of using a table like this is that it will save space in the DB and actually make your queries faster (because you will fit more rows into page).

So, each species can only have distinct properties. For example, a species can only have one row for number of legs, number of wings, etc.... As such, you could use something like this.

Code:
select    i_spID,
        max(case When t_property = 'nmb_legs' Then i_property_val End) As nmb_legs,
        max(case When t_property = 'nmb_wings' Then i_property_val End) As nmb_wings
From    t_species
group by i_spID

The code shown above is a good method for converting rows into columns. There are various ways to do this. This is my preferred method.

If you want an exact match for the query you provided, you could use this:

Code:
select    i_spID,
        t_species,
From    t_species
group by i_spID, t_species
Having max(case When t_property = 'nmb_legs' Then i_property_val End) = 6
        And max(case When t_property = 'nmb_wings' Then i_property_val End) = 4

Note that this is a grouping query, also known as an aggregate query (google those terms to learn more about them). With an aggregate query, where clauses cause problems because they typically eliminate too much data. Instead, you can use the having clause to effectively filter the data instead.
Hi George,

Yes, they both work. The second is a straightforward and the first I modified so it became a base query from which I can extract a desired result:
Code:
select i_spID FROM
(select    i_spID,
        max(case When t_property = 'nmb_legs' Then i_property_val End) As nmb_legs,
        max(case When t_property = 'nmb_wings' Then i_property_val End) As nmb_wings
From    t_species
group by i_spID) AS q1
WHERE nmb_legs=6
AND nmb_wings=4


Thank you!
Alex
 
Last edited:
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top