Help with a special type of comparison

GenDis · Dec 10, 2007

Hi,

I'm new here and I've got a problem. Hopefully some of you who love a challenge can provide some useful suggestions. Let me try to explain the scenario.

It has to do with comparing some values in a recordset to some previously defined reference values and in the process obtain an integer value that represents the count of mismatches found per row. Sounds straightfoward but it isn't a simple column per column comparison but four columns taken as a whole. In other words, we start with four variables and for each one of them I have to check if the variable is found in any of the four columns in my table. If it isn't found it counts as 1, but if it is found the column where the match ocurred should be ommited for the other variables.

Take a look at this little table. The original has many other fields but I've tried to simplify my example for clarity.

ID,m1,m2,m3,m4
R1,15,15,17,18
R2,13,13,15,18
R3,14,15,18,18
R4,15,16,17,18

So let's say the values from Record #1 (R1) are set up as our reference values. At first glance, if we take R2 and compare it to R1 we may see 3 differences, but there are really only 2. To get our mismatch count the reference values that match in each records do not count, even if they are not aligned in the same column.

The two 13's in R2 do not have a match in R1, so these count as 2 mismatches, but the 15 in R2 matches a 15 in R1 as well as the 18 in R2 also matches a 18 in R1. Total mismatches = 2.

In R3, m1 = mismatch, m2 = match, m3 = match and m4 = mismatch. Note that R3.m4 (18) has a match with R1.m4 (18), but R3.m3 has already been paired with R1.m4 and each variable can not be matched more than once.

I hope my explanation is clear enough but if not feel free to ask. Finally, all this needs to be done exclusively with MySQL functions in a single query expression.

As an example, I'm trying the following approach, but is not is not satisfying the requirements and I need to convert the count column to an integer value.

Code:

SET @m1=15,@m2=15,@m3=17,@m4=18;

SELECT CONCAT_WS(',',ABS(STRCMP(`m1`,@m1))&ABS(STRCMP(`m1`,@m2))&ABS(STRCMP(`m1`,@m3))&ABS(STRCMP(`m1`,@m4)),
ABS(STRCMP(`m2`,@m1))&ABS(STRCMP(`m2`,@m2))&ABS(STRCMP(`m2`,@m3))&ABS(STRCMP(`m2`,@m4)),
ABS(STRCMP(`m3`,@m1))&ABS(STRCMP(`m3`,@m2))&ABS(STRCMP(`m3`,@m3))&ABS(STRCMP(`m3`,@m4)),
ABS(STRCMP(`m4`,@m1))&ABS(STRCMP(`m4`,@m2))&ABS(STRCMP(`m4`,@m3))&ABS(STRCMP(`m4`,@m4))) 
AS `count`, `ID`, `m1`, `m2`,`m3`, `m4` FROM tbl_ht ORDER BY ID;

I realize that it may be necessary to create a user defined function to accomplish my objective. What are your thoughts? Thanks.

p.s.

Here's the table and data to run the above expression:

CREATE TABLE `tbl_ht` (
`ID` varchar(5) NOT NULL,
`m1` int(3) default NULL,
`m2` int(3) default NULL,
`m3` int(3) default NULL,
`m4` int(3) default NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO tbl_ht VALUES ('R1',15,15,17,18);
INSERT INTO tbl_ht VALUES ('R2',13,13,15,18);
INSERT INTO tbl_ht VALUES ('R3',14,15,18,18);
INSERT INTO tbl_ht VALUES ('R4',15,16,17,18);

r937 · Dec 10, 2007

Code:

select ID
     , case when @m1 in ( m1,m2,m3,m4 )
            then 0 else 1 end +
       case when @m2 in ( m1,m2,m3,m4 )
            then 0 else 1 end +
       case when @m3 in ( m1,m2,m3,m4 )
            then 0 else 1 end +
       case when @m4 in ( m1,m2,m3,m4 )
            then 0 else 1 end            as mismatches
  from tbl_ht

results:

ID mismatches
R1 0
R2 1
R3 1
R4 0

r937.com | rudy.ca

GenDis · Dec 10, 2007

Thanks. That's a simple and elegant alternative. I'll try to adapt it and see if I get the expected results.

GenDis · Dec 11, 2007

OK. I switched the variables like this:

Code:

select ID, m1, m2, m3, m4,
  case when m1 in ( @m1,@m2,@m3,@m4 )then 0 else 1 end +
  case when m2 in ( @m1,@m2,@m3,@m4 )then 0 else 1 end +
  case when m3 in ( @m1,@m2,@m3,@m4 )then 0 else 1 end +
  case when m4 in ( @m1,@m2,@m3,@m4 )then 0 else 1 end 
as mismatches from tbl_ht

And came with these results:

ID m1 m2 m3 m4 mismatches
------ ------ ------ ------ ------ ----------
R1 15 15 17 18 0
R2 13 13 15 18 2
R3 14 15 18 18 1
R4 15 16 17 18 1

The only unexpected result is at R3, where the mismatches count should also be 2, as explained in my first post.

I need to add a condition in each iteration that omits from the IN clause the values that have previously been matched or paired. Any ideas?

I was thinking of concatenating the @m variables with a separator and then if a match is found use the replace function so that in the next case the matched value doesn't exist in the list. The problem with this is that the replace function replaces all occurrences in the string of the value passed and I need to exclude just one. So maybe I could use the FIELD function to find the position in the string but then it gets too complex to make the string replacement, etc..

It looks like there's no practical solution to this problem unless I create a user defined function to make the comparison, which brings me to another problem: When I use the DELIMITER $$ I get an error from the server. Could this be a version problem? (I'm using MySQL 4.1.22)

Thanks in advance!

r937 · Dec 11, 2007

yes, the DELIMITER error is because you aren't using version 5

regarding your matching logic, i guess i don't understand it, so i can only wish you good luck with your stored proc

r937.com | rudy.ca

GenDis · Dec 11, 2007

Thanks again. I will keep at it.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help with a special type of comparison

GenDis

Programmer

r937

Technical User

GenDis

Programmer

GenDis

Programmer

r937

Technical User

GenDis

Programmer

Similar threads

Part and Inventory Search

Sponsor