Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Trying to find the 'best match' for numbers inc. Zeros 2

Status
Not open for further replies.

Welshbird

IS-IT--Management
Jul 14, 2000
7,378
0
0
DE
Hi guys - I'm needing some help with a query, and I've reached the 'no ideas anymore' stage!

I'm performing a kind of 'fuzzy' match in order to find the most similar record in one table to a record in another table.

There are several fields to be compared, and some of them are deemed more important to match than others, so I have another wee small table containing weighting scores.

Some of the matching fields are text, and these have to match exactly or not at all, so these are easy (if are = then 1 * weighting score, else 0), and the rest are numeric fields and I'm just trying to find the closest.

Because I need to allow for the weighting score, I was aiming to divide the smallest number by the biggest, then if they match I'll get 1, and if not I'll get a fraction which is smaller than 1; this can then be multiplied by the weighting score again.

So far so good! I've built a 'model' of this is Excel so I can see if working and in general I'm really happy.

It has thrown up an issue that I hadn't thought of though...

Some of the numbers to be matched may well be zero.

This is perfectly valid, but doesn't help if I'm trying to get close to a 1 by dividing one by the other.

So, any ideas about how I calculate how close a number is, ending up with a maximum of 1, if one of them can be zero??

Thanks in advance for any suggestions at all.

Fi.

"The question should be, is it worth trying to do, not can it be done"


 
Can't you check for '0' first before applying your other tests?

Alex
 
Oh yes! Sorry, I didn't explain my issue very well.

Checking if one of hte numbers is zero is ok. But I'm not sure what to do then. I know that I expect to end up with a number less than or equal to 1, based on how near the number is, and I had aimed to calculate this by dividing the smaller number by the bigger number.

I can't do this division if one of the numbers is zero though, so what calculation could I do which would give an answer less than one, calculated from how close they are to each other.

The only solution I can come up with is to add 0.0001 to any zero's, then the calculation would work as is (and that small amendment wouldn't affect the accurace of a fuzzy match too much)

But surely there must be a cleaner and more elegant solution??



Fi.

"The question should be, is it worth trying to do, not can it be done"


 
Bon jour, Fi.

Fi said:
if are = then 1 * weighting score, else 0
Your "Zero compare" situation should not matter if you are calculating using the above calculation: "If 0 = 0, then 1 * weighting score," Right?

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[ Providing low-cost remote Database Admin services]
Click here to join Utah Oracle Users Group on Tek-Tips if you use Oracle in Utah USA.
 
Bon jour Dave!

Yes, thats right. My problem occurs when one number is zero, and the one I'm trying to match to it isn't.

If there is a choice between 1.6 and 11 then clearly 1.6 is 'nearer' to zero.

I just really don't think its the 'right' thing to just change the zero to 0.0001 even though it would work for my purpose, but there is surely a better solution.

You see, when I have no zero, then if my master record is 10, and my choices to match are 6 or 8, then I will get 0.6*score, or 0.8*score, which is great.

Any ideas?

Fi.

"The question should be, is it worth trying to do, not can it be done"


 
How about you take the average of the two numbers, then divide this by the largest number e.g. both numbers = 32 then average is 32 so calculation is 32/32 = 1

If one number is 32 and the other is 0 then average =16 so calculation is 16/32 = 0.5 - the closer the second number is to the first will still give you a total closer to 1. Or is this pants?

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.
 

I just really don't think its the 'right' thing to just change the zero to 0.0001 even though it would work for my purpose, but there is surely a better solution.

Even if a better solution exists, you shouldn't feel guilty for using such ‘inelegant’ workaround. This solution is just as good as rounding .9999 to 1, and I am sure you never feel guilty for doing the latter.
 
Gosh, thanks! [blush]

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.
 
Murgle you are a genius!!!!

(If you happen to be going to the london TTUK beery meet I'll certainly stand you several pints!!!!)

Have a star from me too!

Fi.

"The question should be, is it worth trying to do, not can it be done"


 
Hmmmmm, London, beery meet. Hang on, I'll just ask the wife....





.....damn, she say no. But thanks for the offer [dazed] - best have some for me then! [cheers]

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.
 
Have you also tested mod(a-b)/a rather than average. Number approaches zero rather than 1.

I'm going to PCBM so I'll stand in for Murgle :)
 
Fabby - I'll be at the bar at some point with my money in my fat little fingers!

I may be able to use (1 - mod(a-b)/a) either then!

Fi.

"The question should be, is it worth trying to do, not can it be done"


 
Um.... that doesn't work actually.

If I have 29 or 0.3 in one table, and I'm comparing them both to 0, both will get 0.5 as a 'similarity' score, when I really need to know that 0.3 is significantly nearer to 0 than is 29.

Thinking aloud (or at least through my typing fingers) then I must be doing some number minus numer, as that is the difference? can't see the final solution yet though.

Fi.

"The question should be, is it worth trying to do, not can it be done"


 
I think I might have it! - So will post the solution here incase anyone else ever has the same problem.

So, I need to calcuate how similar a number is to another number, and I need the answer to be 1 if they are the same, or a decimal if they are not the same. the decimal gets smaller if they are less similar (does this make sense so far??)

My numbers are in a field called 'Miles'

So, the formular is:

1 - ((abs(miles1 - miles2)/max(miles)

And this works if there are zero's in either mile solution

Hurrah!


Fee.

"The question should be, is it worth trying to do, not can it be done"


 
Aha, I was just about to post something - I might as well. If you added an IF statement to the calculation so that a value of 0.5 triggers a second calculation to divide 0.5 by the largest of the two numbers, then the smaller the result the further away from zero the larger number is.

But your's is far more elegant!

How can you tell if a Systems Engineer is an extrovert? - He looks at Your shoes when he talks to you.
 

Sorry if I am being thick.

I am using a similar formula (not the same, as the needs are different) on a regular basis when calculating %% of differences of this year's data from the same month of the last year.

It doesn't create division by zero situations, if only at least one number is not zero. But I fail to see how it will calculate "closeness" of one number to another if one of them is zero. Wouldn't it always return 1, no matter what the second number is?

Or I just misunderstood the formula?
(By the way, as posted, it has more open brackets than closed ones.)
 
Maybe my formula makes more sense if I write it out in words (it does appear to work in testing)

OK, I take the absolute difference between the two numbers, and divide this by the maximum value these numbers could be. I then take that answer away from one.

If the numbers are the same, I get one. If not, I get a decimal smaller than one, and the further apart they are (i.e. the larger the top number of the fraction is) the smaller the result when you take that away from one.

Does it make more sense that way Stella740pl?

Fee.

"The question should be, is it worth trying to do, not can it be done"


 

...absolute difference between the two numbers, and divide this by the maximum value these numbers could be...

OK, thanks, this makes much more sense. For some reason, I thought you meant to divide by the greater of the two numbers, not by the maximum of them all. You are right, of course this should work.

One more thing you should think about is that your formula is valid on a static table, or if you have an absolute maximum no matter what happens to your table.

If your table is not static, and the maximum value can change while calculations are not complete (say, a record that holds current maximum is deleted, or a new one added, with a value higher than the current maximum), you will get incorrect results.

 
I did think of that actually! (Rare for me) so the maximum is a function which calculates
Code:
select max(field) from table
so its sort of a constant, but is really dynamic, and that makes it much easier to read in the code.



Fee.

"The question should be, is it worth trying to do, not can it be done"


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top