Trying to find the 'best match' for numbers inc. Zeros 2

Welshbird · Nov 22, 2005

Hi guys - I'm needing some help with a query, and I've reached the 'no ideas anymore' stage!

I'm performing a kind of 'fuzzy' match in order to find the most similar record in one table to a record in another table.

There are several fields to be compared, and some of them are deemed more important to match than others, so I have another wee small table containing weighting scores.

Some of the matching fields are text, and these have to match exactly or not at all, so these are easy (if are = then 1 * weighting score, else 0), and the rest are numeric fields and I'm just trying to find the closest.

Because I need to allow for the weighting score, I was aiming to divide the smallest number by the biggest, then if they match I'll get 1, and if not I'll get a fraction which is smaller than 1; this can then be multiplied by the weighting score again.

So far so good! I've built a 'model' of this is Excel so I can see if working and in general I'm really happy.

It has thrown up an issue that I hadn't thought of though...

Some of the numbers to be matched may well be zero.

This is perfectly valid, but doesn't help if I'm trying to get close to a 1 by dividing one by the other.

So, any ideas about how I calculate how close a number is, ending up with a maximum of 1, if one of them can be zero??

Thanks in advance for any suggestions at all.

Fi.

"The question should be, is it worth trying to do, not can it be done"

alexhu · Nov 22, 2005

Can't you check for '0' first before applying your other tests?

Alex

Welshbird · Nov 22, 2005

Oh yes! Sorry, I didn't explain my issue very well.

Checking if one of hte numbers is zero is ok. But I'm not sure what to do then. I know that I expect to end up with a number less than or equal to 1, based on how near the number is, and I had aimed to calculate this by dividing the smaller number by the bigger number.

I can't do this division if one of the numbers is zero though, so what calculation could I do which would give an answer less than one, calculated from how close they are to each other.

The only solution I can come up with is to add 0.0001 to any zero's, then the calculation would work as is (and that small amendment wouldn't affect the accurace of a fuzzy match too much)

But surely there must be a cleaner and more elegant solution??

Fi.

"The question should be, is it worth trying to do, not can it be done"

SantaMufasa · Nov 22, 2005

Bon jour, Fi.

Fi said:
if are = then 1 * weighting score, else 0

Your "Zero compare" situation should not matter if you are calculating using the above calculation: "If 0 = 0, then 1 * weighting score," Right?

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[

http://www.dasages.com:

Providing low-cost remote Database Admin services]
Click here to join Utah Oracle Users Group on Tek-Tips if you use Oracle in Utah USA.

Welshbird · Nov 22, 2005

Bon jour Dave!

Yes, thats right. My problem occurs when one number is zero, and the one I'm trying to match to it isn't.

If there is a choice between 1.6 and 11 then clearly 1.6 is 'nearer' to zero.

I just really don't think its the 'right' thing to just change the zero to 0.0001 even though it would work for my purpose, but there is surely a better solution.

You see, when I have no zero, then if my master record is 10, and my choices to match are 6 or 8, then I will get 0.6*score, or 0.8*score, which is great.

Any ideas?

Fi.

"The question should be, is it worth trying to do, not can it be done"

Murgle · Nov 22, 2005

How about you take the average of the two numbers, then divide this by the largest number e.g. both numbers = 32 then average is 32 so calculation is 32/32 = 1

If one number is 32 and the other is 0 then average =16 so calculation is 16/32 = 0.5 - the closer the second number is to the first will still give you a total closer to 1. Or is this pants?

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.

Stella740pl · Nov 22, 2005

I just really don't think its the 'right' thing to just change the zero to 0.0001 even though it would work for my purpose, but there is surely a better solution.

Even if a better solution exists, you shouldn't feel guilty for using such ‘inelegant’ workaround. This solution is just as good as rounding .9999 to 1, and I am sure you never feel guilty for doing the latter.

SantaMufasa · Nov 22, 2005

Murgle...what a great idea. Well done. Have a Purple Star!

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[

http://www.dasages.com:

Providing low-cost remote Database Admin services]
Click here to join Utah Oracle Users Group on Tek-Tips if you use Oracle in Utah USA.

Murgle · Nov 22, 2005

Gosh, thanks! [blush]

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.

Welshbird · Nov 22, 2005

Murgle you are a genius!!!!

(If you happen to be going to the london TTUK beery meet I'll certainly stand you several pints!!!!)

Have a star from me too!

Fi.

"The question should be, is it worth trying to do, not can it be done"

Murgle · Nov 22, 2005

Hmmmmm, London, beery meet. Hang on, I'll just ask the wife....

.....damn, she say no. But thanks for the offer [dazed]

- best have some for me then! [cheers]

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.

Zelandakh · Nov 22, 2005

Have you also tested mod(a-b)/a rather than average. Number approaches zero rather than 1.

I'm going to PCBM so I'll stand in for Murgle

Welshbird · Nov 22, 2005

Fabby - I'll be at the bar at some point with my money in my fat little fingers!

I may be able to use (1 - mod(a-b)/a) either then!

Fi.

"The question should be, is it worth trying to do, not can it be done"

Welshbird · Nov 23, 2005

Um.... that doesn't work actually.

If I have 29 or 0.3 in one table, and I'm comparing them both to 0, both will get 0.5 as a 'similarity' score, when I really need to know that 0.3 is significantly nearer to 0 than is 29.

Thinking aloud (or at least through my typing fingers) then I must be doing some number minus numer, as that is the difference? can't see the final solution yet though.

Fi.

"The question should be, is it worth trying to do, not can it be done"

Welshbird · Nov 23, 2005

I think I might have it! - So will post the solution here incase anyone else ever has the same problem.

So, I need to calcuate how similar a number is to another number, and I need the answer to be 1 if they are the same, or a decimal if they are not the same. the decimal gets smaller if they are less similar (does this make sense so far??)

My numbers are in a field called 'Miles'

So, the formular is:

1 - ((abs(miles1 - miles2)/max(miles)

And this works if there are zero's in either mile solution

Hurrah!

Fee.

"The question should be, is it worth trying to do, not can it be done"

Murgle · Nov 23, 2005

Aha, I was just about to post something - I might as well. If you added an IF statement to the calculation so that a value of 0.5 triggers a second calculation to divide 0.5 by the largest of the two numbers, then the smaller the result the further away from zero the larger number is.

But your's is far more elegant!

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.

Stella740pl · Nov 23, 2005

Sorry if I am being thick.

I am using a similar formula (not the same, as the needs are different) on a regular basis when calculating %% of differences of this year's data from the same month of the last year.

It doesn't create division by zero situations, if only at least one number is not zero. But I fail to see how it will calculate "closeness" of one number to another if one of them is zero. Wouldn't it always return 1, no matter what the second number is?

Or I just misunderstood the formula?
(By the way, as posted, it has more open brackets than closed ones.)

Welshbird · Nov 24, 2005

Maybe my formula makes more sense if I write it out in words (it does appear to work in testing)

OK, I take the absolute difference between the two numbers, and divide this by the maximum value these numbers could be. I then take that answer away from one.

If the numbers are the same, I get one. If not, I get a decimal smaller than one, and the further apart they are (i.e. the larger the top number of the fraction is) the smaller the result when you take that away from one.

Does it make more sense that way Stella740pl?

Fee.

"The question should be, is it worth trying to do, not can it be done"

Stella740pl · Nov 24, 2005

...absolute difference between the two numbers, and divide this by the maximum value these numbers could be...

OK, thanks, this makes much more sense. For some reason, I thought you meant to divide by the greater of the two numbers, not by the maximum of them all. You are right, of course this should work.

One more thing you should think about is that your formula is valid on a static table, or if you have an absolute maximum no matter what happens to your table.

If your table is not static, and the maximum value can change while calculations are not complete (say, a record that holds current maximum is deleted, or a new one added, with a value higher than the current maximum), you will get incorrect results.

Welshbird · Nov 25, 2005

I did think of that actually! (Rare for me) so the maximum is a function which calculates

Code:

select max(field) from table

so its sort of a constant, but is really dynamic, and that makes it much easier to read in the code.

Fee.

"The question should be, is it worth trying to do, not can it be done"

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Trying to find the 'best match' for numbers inc. Zeros 2

IS-IT--Management

MIS

IS-IT--Management

Technical User

IS-IT--Management

Technical User

Programmer

Technical User

Technical User

IS-IT--Management

Technical User

MIS

IS-IT--Management

IS-IT--Management

IS-IT--Management

Technical User

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Similar threads

Log in

Part and Inventory Search

Sponsor