Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

longest match in Teradata 2

Status
Not open for further replies.

bunwong

IS-IT--Management
Mar 15, 2004
13
HK
Dear all,

We are seeking for a most effective way to perform longest match in Teradata. Following is our situation:

The source data is called "call_records" (2.8 million records).
The pattern table is called "dialed_digits" (15,000 records).

We are using the following SQL to perform longest match. We are not sure if we are using the most efficient method to handle this in Teradata.
Any way better than this?

Thanks!

--------------------------------------------------------------------------------------------------------------------------------
insert into ADM1.longest_match_dialed_digits
(
dialed_digits
,mapped_dialed_digits
,destination
,country_code
,brand_model_id
,cdr_set
)
select
dt.dialed_digits_a
,dt.dialed_digits_b
,dt.destination
,dt.country_code
,dt.brand_model_id
,dt.cdr_set
from
(
select
a.dialed_digits as dialed_digits_a
,b.dialed_digits as dialed_digits_b
,b.destination
,b.country_code
,b.brand_model_id

,RANK() OVER (PARTITION BY a.dialed_digits, a.cdr_set
ORDER BY b.dialed_digits DESC, b.brand_model_id asc ) rnk
,a.cdr_set
from
adm1.call_records a
,adm1.dialed_digits b
where a.dialed_digits like b.dialed_digits || '%'
) dt
where rnk = 1;
 
Hi bunwong,

do you have a field on "dialed_digits" holding the number of digits?

I'm assuming that this table holds records like 1, 12, 123, etc?

Then if you have a column (e.g. highest_match) to hold 1 for 1, 2 for 12, 3 for 123, etc you should be able to use somthing like this

select a.dialed_digits as dialed_digits_a
,b.dialed_digits as dialed_digits_b
,b.destination
,b.country_code
,b.brand_model_id
,a.cdr_set
,max(b.highest_match)
from adm1.call_records a
cross join adm1.dialed_digits b
where a.dialed_digits like b.dialed_digits || '%'
group by 1,2,3,4,5,6


I haven't tested the above (don't know Primary Index, etc), so you'll need to check the performance first!

Roger...
 
Hi bunwong,

try to add missing dialed_digits to increase the minimum length, e.g. up to 4 digits:

if
41 = Switzerland
411 = Switzerland, Zurich
4122 = Switzerland, Geneva
4174 = Switzerland, Mobile

then every phone call starting with 4110..4119 is Zurich
and 4100..4199 (minus those values for Zurich, Geneva and Mobile) is Switzerland.

Use this SQL to create the missing dialed_digits (this is based on the SQL you've already found):
Code:
create table prefixes2 as   ( 

select 
  firstchars, 
  pattern, 
  4 as pattern_len, 
  destination_tag
from 
  (select substring(called_no from 1 for 4) as firstchars
   from CDRs group by 1) c
join 
  (select * from prefixes p where pattern_len <= 4) p
on c.firstchars like p.pattern || '%'
QUALIFY
  RANK() OVER (PARTITION BY firstchars 
               ORDER BY p.pattern DESC) = 1

union 

select substring(pattern from 1 for 4), p.*
from prefixes p 
where pattern_len > 4

) with data
primary index (firstchars);

collect statistics on prefixes2 column pattern; 
collect statistics on prefixes2 column firstchars;
Then you can add an extra join condition:
Code:
sel 
  c.called_no
  ,p.pattern
  ,p.Destination_Tag
FROM
  CDRs c
    LEFT JOIN Prefixes2 p
      ON c.called_no LIKE p.pattern||'%'
     AND SUBSTR(c.called_no,1,4) = p.firstchars
QUALIFY
  RANK() OVER (PARTITION BY c.called_no 
               ORDER BY p.pattern DESC) = 1
This will greatly reduce the comparisons in the product join and the size of the intermediate spool.

To create the prefixes2:
Instead of using the called_no to create the missing data you can use a table with integers from 1000 to 9999:
Code:
select 
...
from 
  (select substring(called_no from 1 for 4) as firstchars
   from CDRs group by 1) c
join 
  (select * from prefixes p where pattern_len <= 4) p
...
Then you only have to create it once and can reuse it (until the dialed_digits table changes).

Dieter
 
Hi dnoeth,

After applying your suggest, we got a dramatic improvement (> 10 times) on the matching.

We have go ahead with your approach. Thanks.

Bun
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top