Compare list to another list and return containing rows 2

CassidyHunt · Nov 10, 2008

I have two lists. I need to compare them and return any rows contain the string, and rank on percentage of string match.

Anyone ever do this? Let me know if I need to put an example or if the above is enough.

Cassidy

SantaMufasa · Nov 10, 2008

Cassidy,

Can you please post a few examples of:[ul][li]rows from the two lists[/li][li]How you want rows compared (e.g., is case a factor or not)[/li][li]Output from comparing the results (e.g., do you want every row in list 1 compared to every row in list 2, et cetera)[/li][/ul]

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”

CassidyHunt · Nov 10, 2008

Sure. Here is the closest I have been able to match so far:

Code:

select *
  from (select id,
               arg_1 as user_2,
               arg_2 as alt_id,
               utl_match.edit_distance_similarity(arg_1, arg_2) rnkfact
          from (select a.id, a.user_2 arg_1, b.alt_id arg_2
                  from part a, xms_mining_inv b where a.user_2 is not null))
 where rnkfact > 0
 order by rnkfact

So in that list I am comparing every row in a table against every row in another table. Minus the rows in the first table that are null. That function seems to return a number that says how similiar the the data is between the two arguments. Not sure 100% if that is what I am lookign for.

Let me know if I need to provide the output example of if you can get it from there.

Cassidy

SantaMufasa · Nov 10, 2008

Cassidy,

What is there about your solution, above, that does not meet your needs?

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”

CassidyHunt · Nov 10, 2008

Mostly speed and I am not sure if what it is doing is really what I am after or not. Still reading on the what function was designed for. From what I understand is it was really intended for spell check and determining how close a word is to another word to aid in selecting the best words to suggest. Now I don't know how that works on part numbers OEM numbers like I am using but it seems to return strange results. I.E.

Arg 1: 1234F56
Arg 2: 1209G42

Result: 29

Now what does that mean?

Code:

select utl_match.edit_distance_similarity('1234F56', '1209G42') rnkfact
  from dual

That is what I am trying to figure out.

Cassidy

SantaMufasa · Nov 10, 2008

Cassidy,

The algorithm for utl_match.edit_distance_similarity is based not only on character matches, but also positional matches:

Code:

select utl_match.edit_distance_similarity('1234F56', '1209G42') rnkfact from dual;

   RNKFACT
----------
        29

select utl_match.edit_distance_similarity('1234F56', '1204G92') rnkfact from dual;

   RNKFACT
----------
        43

Notice that all I did in the second example was positionally swap the "4" with the "9" so that the "4"s from the two different arguments match positionally. Once that occurs, the "matching value" increases from 29 to 43.

So, the algorithm uses both ASCII value (case does make a difference to the algorithm) and position.

Let me know if this resolves your question(s).

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”

CassidyHunt · Nov 12, 2008

That is the way I understand it too. However in a full text search is this going to help me find a string inside of it that is comparable to the one I am looking for?

SantaMufasa · Nov 12, 2008

Cassidy said:
...is this going to help me find a string inside of it that is comparable to the one I am looking for?

If I was looking for "a string inside of it that is comparable...", I would use the instr() function. If the inner string was not an exact match, then I'd use the substr() function in combination with the utl_match.edit_distance_similarity() function.

But I also might be missing the point of what you are trying to do. <grin>

(If I'm missing the point, Cassidy, feel free to give me a phone call to set me straight.)

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”

CassidyHunt · Nov 12, 2008

Its not a big hurry and I don't have an office anymore so I will post it here.

What I have is a varchar2 field with 80 characters that contains part numbers and other information. We recently acquired a shop in Phoenix that some of their part numbers would end up in that field for us to reference when they were a vendor. We might have 20 part numbers in that field all formatted differently. I.E. 10J180BC might be 10-J-180-BC or any other combo you can think of. I have their list of numbers and I have our varchar2 field that I want to build all the possibilities that their numbers could be in our system to avoid duplication.

So I thought maybe doing the above could give me a range to check but that is where I think I am going wrong.

Cassidy

SantaMufasa · Nov 12, 2008

Wouldn't stripping out all punctuation give you what you want? (Stripping out punctuation is a very easy.)

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”

CassidyHunt · Nov 12, 2008

No because it doesn't account for typo's or not having a common delimiter between all of the part numbers or words. Really I am looking to find the matches that are 80% plus of the list for matching. That is what makes it hard.

SantaMufasa · Nov 12, 2008

Then I suggest a function that strips out all characters that are not relevant to the actual ID (e.g. delimiters), then use the utl_match.edit_distance_similarity() function to yield a "comparison value".

Let us know if that sounds reasonable.

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”

CassidyHunt · Nov 12, 2008

I think the thought of speed is not a possibility in a query like this. Thanks for the bouncing board on this.

Cassidy

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Compare list to another list and return containing rows 2

CassidyHunt

IS-IT--Management

SantaMufasa

Technical User

CassidyHunt

IS-IT--Management

SantaMufasa

Technical User

CassidyHunt

IS-IT--Management

SantaMufasa

Technical User

CassidyHunt

IS-IT--Management

SantaMufasa

Technical User

CassidyHunt

IS-IT--Management

SantaMufasa

Technical User

CassidyHunt

IS-IT--Management

SantaMufasa

Technical User

CassidyHunt

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor