Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Many entries in an "IF value IN" statement 1

Status
Not open for further replies.

BJCooperIT

Programmer
May 30, 2002
1,210
US
My client has a procedure in a package that has an IF statement that is repeated in four separate places. It reads something like this:
Code:
IF  v_code IN
    (   'CA','CF','CN','CT','JO','ND','NT','P1','P3',
        'PA','PB','PC','PD','PE','PF','PH','PJ','PK',
        'PL','PM','PN','PO','PP','PR','PS','PT','PU',
        'PW','SF','SG','SO','SS','ST','SW','UO','Y3',
        'Y5','Y8','YA','YB','YD','YG','YK','YM','YN',
        'YO','YR','YT','YT','YU','YV','YW','YX','YY',
        'YZ','Z1','Z2','Z3','Z5','ZE','ZF','ZG','ZH',
        'ZL','ZM','ZN','ZS','ZT','ZU','ZV','ZW','ZX'
    ) THEN
I am considering recommending that this be made into a function. Besides the obvious advantage of centralizing code and ease of maintenance, would this have an affect on performance? This code processes almost 10 million rows when it runs.

[sup]Beware of false knowledge; it is more dangerous than ignorance. ~ George Bernard Shaw[/sup]
Consultant Developer/Analyst Oracle, Forms, Reports & PL/SQL (Windows)
 
Barb,

Sorry to be so long responding...I was off-line yesterday.

Here are my findings regarding performance of in-line code versus a function (running against a 8m-row table):
Code:
 declare
    cnt_true  number := 0;
    cnt_total number := 0;
    v_code    varchar2(2 char);
begin
    for x in (select substr(addr_postal_code,1,2) y
                from transaction_participant) loop
        v_code := x.y;
        IF  v_code IN
            ('CA','CF','CN','CT','JO','ND','NT','P1','P3',
             'PA','PB','PC','PD','PE','PF','PH','PJ','PK',
             'PL','PM','PN','PO','PP','PR','PS','PT','PU',
             'PW','SF','SG','SO','SS','ST','SW','UO','Y3',
             'Y5','Y8','YA','YB','YD','YG','YK','YM','YN',
             'YO','YR','YT','YT','YU','YV','YW','YX','YY',
             'YZ','Z1','Z2','Z3','Z5','ZE','ZF','ZG','ZH',
             'ZL','ZM','ZN','ZS','ZT','ZU','ZV','ZW','ZX'
            ) THEN
            cnt_true := cnt_true+1;
        end if;
        cnt_total := cnt_total+1;
    end loop;
    dbms_output.put_line('Total rows = '||cnt_total||chr(10)
                       ||'Total True = '||cnt_true);
end;
/

Total rows = 8073072
Total True = 799

Elapsed: 00:03:00.81

create or replace function check_true (code_in varchar2) return boolean is
begin
    if code_in IN
            ('CA','CF','CN','CT','JO','ND','NT','P1','P3',
             'PA','PB','PC','PD','PE','PF','PH','PJ','PK',
             'PL','PM','PN','PO','PP','PR','PS','PT','PU',
             'PW','SF','SG','SO','SS','ST','SW','UO','Y3',
             'Y5','Y8','YA','YB','YD','YG','YK','YM','YN',
             'YO','YR','YT','YT','YU','YV','YW','YX','YY',
             'YZ','Z1','Z2','Z3','Z5','ZE','ZF','ZG','ZH',
             'ZL','ZM','ZN','ZS','ZT','ZU','ZV','ZW','ZX'
            )
    then return true;
    else return false;
    end if;
end;
/

Function created.

declare
    cnt_true  number := 0;
    cnt_total number := 0;
    v_code    varchar2(2 char);
begin
    for x in (select substr(addr_postal_code,1,2) y
                from transaction_participant) loop
        v_code := x.y;
        IF check_true(v_code) THEN
            cnt_true := cnt_true+1;
        end if;
        cnt_total := cnt_total+1;
    end loop;
    dbms_output.put_line('Total rows = '||cnt_total||chr(10)
                       ||'Total True = '||cnt_true);
end;
/

Total rows = 8073072
Total True = 799

Elapsed: 00:03:44.48
So, interestingly, in this case, calling a function took 25% (44 seconds) longer than in-line code.

Now, if you restructure the PL/SQL to:[ul][li]Place the values in an Oracle table, and,[/li][li]Include the processing inside the cursor,[/li][/ul]...then there is a 20% performance improvement (3 minutes reduced to 2 mins. 24 seconds):
Code:
declare
    cnt_true  number := 0;
begin
    select count(*) into cnt_true
      from transaction_participant z
     where exists (select 'x'
                     from codes
                    where substr(z.addr_postal_code,1,2) = code_val);
    dbms_output.put_line('Total True = '||cnt_true);
end;
/

Total True = 799

PL/SQL procedure successfully completed.

Elapsed: 00:02:24.17
There are probably several more performance-tuning options, but at least there are three options here for the sake of comparison.


Let us know if any of this is helpful.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
I can't help but chip in here.
I totally commend Santa's approach of putting the values in a table on their own - it's much easier to maintain and use, and since this aligns with the idea of maintainability alluded to, nice one.

You might have tried a deterministic function Santa, but since I disagree with it in the first place (I prefer your table approach) I don't want to say anything further.

How about your latest approach, with a function based index to handle the first two characters, instead of doing the substr, which must be forcing a full table scan?

Responses, scurrilous or otherwise......

Regards

T
 
Thargy said:
How about your latest approach, with a function based index to handle the first two characters, instead of doing the substr, which must be forcing a full table scan?
Absolutely...If, in real life, I was actually interested in the first two characters of ADDR_POSTAL_CODE, I would create a function-based index. But since I was simply contriving the use of one of my Production ~10M-row tables (to simulate Barb's situation), I was satisfied not to build a function-based index just for this illustration.


But excellent catch, Thargy, nonetheless.


[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
In addition to a function-based index on z.addr_postal_code, I would also take a look at getting rid of that subquery and seeing if that helped:

Code:
.
.
.
select count(z.addr_postal_code) into cnt_true
      from transaction_participant z
           OUTER JOIN codes c
              ON c.code_val = substr(z.addr_postal_code,1,2);
.
.
.
Depending on the content of transaction_participant, an index on codes.code_val might also be useful, but with only 72 two-character codes, this would probably not be a huge pickup and quite possibly, the index would be ignored.
 
Thanks for your responses. Actually, the codes are in a table with a unique index on the code. Problem is, this agency of the government is not interested in adding a column to that table so that I could distinguish between those particular codes and the others stored there.

My task is to look at the package code and see if, with some simple tweaking, the batch processing could run in less than the two and a half hours it has been running. I have already proven that I can shave about 12 minutes off using an array instead of repeatedly selecting from a 99-row validation table. Budget cuts, ya know? (-:

I did some benchmark testing and also found the function to be slightly slower. That is why I was wondering if anyone had another positive advantage I could use to bolster my recommendation since I couldn't claim it would run faster. You know, something like storage savings since the literals wouldn't be declared over and over.

I guess on Monday I will try loading those codes into an array and verify if v_code is one of those values. That method goes beyond a "tweak" but is worth investigating.

Nice to "see" you all again.

[sup]Beware of false knowledge; it is more dangerous than ignorance. ~ George Bernard Shaw[/sup]
Consultant Developer/Analyst Oracle, Forms, Reports & PL/SQL (Windows)
 
If they are worried about "budget cuts", then you have your justification already in place:
Justification said:
[ul][li]The single, most-expensive component of most applications is the cost of programmers and other personnel (versus hardware, network, et cetera).[/li][li]Business rules are very prone to change.[/li][li]Whenever business rules involve software changes, then it involves programmers, QA, and other personnel[/li][li]The existing 99 codes reflect business rules.[/li][li]If it is possible to define business rules in database tables (versus defining business rules in software), and the performance costs of accessing the tables do not exceed programming costs of defining/maintaining codes in the software, then it reduces overall project/product cost by avoiding costlier programming.[/li][/ul]
Does that make sense?

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm or risk. The cost will be your freedoms and your liberty.”
 
Very Interesting. Thanks for the insight. I'll give Mufsa a star

Here is my 2 cents. You can replace the substr() with LIKE. My past tests have shown that "LIKE" is slightly slower than "=".

Code:
where z.addr_postal_code LIKE code_val||'%'
 
BJ,

I've faces similar situations before.
May I suggest that you copy the tables in question to a separate schema (or preferably dedicated server) and add the requisite field in your local copy. Then do the timing tests, and show the before and after results.

As I think we all suspect, there would be a startling reduction in processing time given that one "magic" field. I have found that once demonstrated, evidence of significant improvements is hard to ignore (even for managers with a tenuous grasp of reality).

Regards

T
 
We have gotten a little off-track here with SUBSTR's and LIKE's since neither are part of my problem.

Adding a column to the table does not improve performance since the IF statement would have to be replaced with a SELECT. It only makes sense that adding 10,000,000 selects of the table is going to slow processing down. Plus, there is the added cost of changing the front-end to maintain the new column.

However, when I load those values into an array and use it in place of the IF, the benchmarks indicate about a 3.5 minute savings. At least those values are declared only once which will make the code easier to maintain.


[sup]Beware of false knowledge; it is more dangerous than ignorance. ~ George Bernard Shaw[/sup]
Consultant Developer/Analyst Oracle, Forms, Reports & PL/SQL (Windows)
 
BJ,

er, I think I'll have to take issue with some of that. If the field with the requisite characters existed, it could be indexed. It would then become feasible to select using an inner join between the two. Since this would be SQL, and not involve PL, I'd bet a pound to a penny that it would be significantly faster. Having an index would avoid the need for the full table scan. As for maintaining the column, no front end change is necessary. You could create the field, and maintain it with either a trigger (geck, bleh!) or a stored procedure. Since you're writing one of those anyway, what's the incremental work?

Do you happen to have say 50 or so sample lines of data that you could post? I'm sufficiently curious as to be willing to brew up some timing tests of my own.

Regards

T
 
I created a table with the code indexed. I then wrote a script to load an array and use it on 1,000,000 rows and then the used the SELECT on 1,000,000 rows. The array took 9 seconds compared to the SELECT which took 38 seconds. Not a huge savings on 10,000,000 rows, but a savings nonetheless.

This is the federal government and there are restrictions in place about maintaining the data. The data is confidential, so I cannot post it. I appreciate your willingness to help, though.

[sup]Beware of false knowledge; it is more dangerous than ignorance. ~ George Bernard Shaw[/sup]
Consultant Developer/Analyst Oracle, Forms, Reports & PL/SQL (Windows)
 
fair enough BJ.
I always enjoy "nutting out" performance issues.

Regards

T
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top