Case insensitive searching 8

ifx · Mar 22, 2003

Hi,

I'm developing a website in PHP using an Oracle 8i database, and need to carry out a search on a database containing academic papers. I need to search the title field, but the results are only coming back if I get the case right (ie the search is case sensitive). How do I construct the query so it's not case sensitive. I've tried various methods such as LIKE and the % operator. Eg:
SELECT * FROM PAPERS WHERE TITLE LIKE '%$Query%';

Any ideas on how to stop the search being case sensitve so I could put in a word like 'Academic' or 'academic' and they'd both match 'AcAdEmIc' for example!?

Thanks a lot!

BJCooperIT · Mar 22, 2003

Try converting the fields to all uppercase like:

SELECT * FROM PAPERS
WHERE upper(TITLE) LIKE upper('%$Query%');

Lowercase will work too. The disadvantage of this method is that you will wind up doing a full table scan of PAPER.

Good luck!

(select * from life where brain is not null)
Consultant/Custom Forms & PL/SQL - Oracle 8.1.7 - Windows 2000

AOLEURO · Mar 23, 2003

Hej Ifx

You need to tell us how often you are using this statement and how many rows the table got.

The solution from BJCooperIT generates a FULL TABLE scan on the table, because you are using a function.

If you are using this statement often and the table is big, then I would create a FUNCTION INDEX on the table, like this:

create index Func_Upper_Idx on PAPERS( UPPER(title) );

And generates statistic on the table with execute dbms_stats.gather_table_stats('USERNAME','PAPERS');[/color] Replace USERNAME with correct Schema and execute the statement as SYSDBA.

And use BJCooperIT solution in the where clause.

Function Index works from Oracle 8i. Regards
Allan
Icq: 346225948
! If you can't prove it - don't bother try to teach me !

ifx · Mar 23, 2003

Hi,

Thanks for your answers, both look like it's what I'm after. The database is only likely to hold up to about 400 max, but it's probabaly going to be more like 150 to 200. The first one I've already tried and seems to work well (no performance difference over a standard search).

Thanks again!

BJCooperIT · Mar 23, 2003

A question for you Allan:

I thought about the Function Index but decided it probably would not help since the query was non-positional (i.e. Oracle has to search all positions of the column since it was:
LIKE upper('%SMITH%')

My reasoning was that if it had it been a positional query:
LIKE UPPER('SMITH%')
starting in the first position, the index would have been helpful.

I only learned about Function Indexes by word of mouth and the first time we tried one, our queries refused to use it according to our explain plans.

Could you explain how the index still could be helpful if "SMITH" is the last word in the title?

Thanks!
(select * from life where brain is not null)
Consultant/Custom Forms & PL/SQL - Oracle 8.1.7 - Windows 2000

AOLEURO · Mar 24, 2003

Hi BJCooperIT

The screen is best if you view my reply in 1400x1050 and not just 1024*768.

With Ifx reply, I would not create an index just for this type of statement and with so few rows.

Back to you question – always full tablescan with LIKE ‘%....’:

Full table scan is not necessary performed then LIKE is used in where clause. However the Oracle Optimizer generally behaves as you describe because it can’t predict best goal for the statement.

Function indexes are “hidden” calculated function values stored as index – the primary idea is that you calculate values on insert or update and not on queries – so to speak insert ones and read several times you get more speed.

You have to have enabled query rewrite, because Oracle translate the statement and rewrite the query to use the already calculated values in the index.

The Function index has only keys in a datablock (plus RowId to row in table) and wherefore Oracle reads these very fast and read several keys per datablock.

Then Oracle read a datablock from the table – Oracle read fewer keys than index reads, because Oracle have to read all columns per row in the datablock (simple description – not entirely true).

But there or not Oracle use full table scan or index depends on the optimizer.

First I create a table:

Code:

create table papers ( isdn_no  varchar2(10) not null primary key,
                      column2 number default 0,
                      column3 number default 0,
                      column4 varchar2(60) default rpad('a',60,'a'),
                      title varchar2(30) not null
                     ) tablespace table_data;

And insert 100,000 rows with this code – just to get some higher values in statistics:

Code:

begin
    for j in 1..10
    LOOP
        for i in 1..9999
        LOOP   
            insert into papers(isdn_no,title) 
                 values( to_char( i + (J - 1) * 10000 ), to_char( i + (J - 1) * 10000 )||'dsfmjlkngfwoiehfnwonfef' );
        END LOOP;
        insert into papers(isdn_no,title)
             values( to_char( J * 10000 ), to_char( J * 10000 )||'dsfmjlkngfwoieh

Code:

smith

Code:

' );
    END LOOP;
END;
/

Note that I create 10 smith records per 9,999 rows – so smith is placed around the table.

Now I make a statement with trace:

select /*+ full(papers) */ upper(title) from papers where upper(title) like upper('%smith%');

The trace output of 1.test is:

Code:

 10 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=571 Card=5000 Bytes=145000)
   1    0

Code:

TABLE ACCESS (FULL)

Code:

 OF 'PAPERS' (Cost=571 Card=5000 Bytes=145000)

Statistics
----------------------------------------------------------
       1646  recursive calls
          0  db block gets

Code:

  consistent gets

Code:

  physical reads
          0  redo size
        720  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         32  sorts (memory)
          0  sorts (disk)
         10  rows processed


select /*+ full(papers) */ upper(title) from papers where upper(title) like '%SMITH%'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.03          0         21          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.31       0.66       1478       1483          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.32

Code:

0.69

Code:

          0          10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 5  (SYSTEM)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  TABLE ACCESS FULL PAPERS (cr=1483 r=1478 w=0 time=665319 us)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     10   TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'PAPERS'

Well it appears to be very fast, but now I try to create a function index:

create index Func_Upper on papers(upper(title)) tablespace index_data;

And generate statistic:

execute dbms_stats.gather_table_stats('USERNAME',’PAPERS’);

And let’s see a full index scan:

select upper(title) from papers where upper(title) LIKE '%SMITH%';

The trace output of 2.test is:

Code:

 10 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=5000 Bytes=145000)
   1    0

Code:

INDEX (FAST FULL SCAN)

Code:

 OF 'FUNC_UPPER' (NON-UNIQUE) (Cost=11 Card=5000 Bytes=145000)

Statistics
----------------------------------------------------------
       1624  recursive calls
          0  db block gets

Code:

  consistent gets

Code:

  physical reads
          0  redo size
        720  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         32  sorts (memory)
          0  sorts (disk)
         10  rows processed


select upper(title) from papers where upper(title) LIKE '%SMITH%'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.06       0.07          0         21          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.10       0.24        558        562          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.17

Code:

0.32

Code:

          0          10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 5  (SYSTEM)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  INDEX FAST FULL SCAN FUNC_UPPER (cr=562 r=558 w=0 time=244108 us)(object id 47471)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     10   INDEX (FAST FULL SCAN) OF 'FUNC_UPPER' (NON-UNIQUE)

It’s faster than a full table scan – but we also only get the title in uppercase directly from the function index.

Now let’s try a statement where I use the index but want all columns as in the full table scan:

select /*+ index(papers func_upper) */ * from papers where upper(title) LIKE '%SMITH%';

The trace output of 3.test is:

Code:

 10 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=43 Card=5000 Bytes=500000)
   1    0

Code:

TABLE ACCESS (BY INDEX ROWID)

Code:

 OF 'PAPERS' (Cost=43 Card=5000 Bytes=500000)
   2    1

Code:

INDEX (RANGE SCAN)

Code:

 OF 'FUNC_UPPER' (NON-UNIQUE) (Cost=3 Card=5000)

Statistics
----------------------------------------------------------
       1708  recursive calls
          0  db block gets

Code:

  consistent gets

Code:

  physical reads
          0  redo size
       1059  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         32  sorts (memory)
          0  sorts (disk)
         10  rows processed


select /*+ index(papers func_upper) */ * from papers where upper(title) LIKE '%SMITH%'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.03          0         21          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.79       1.50       2004      20726          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.81

Code:

1.54

Code:

          0          10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 5  (SYSTEM)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  TABLE ACCESS BY INDEX ROWID PAPERS (cr=20726 r=2004 w=0 time=1508957 us)
 100000   INDEX RANGE SCAN FUNC_UPPER (cr=558 r=557 w=0 time=427092 us)(object id 47471)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     10   TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 'PAPERS'
 100000    INDEX (RANGE SCAN) OF 'FUNC_UPPER' (NON-UNIQUE)

We see that this statement is too expensive, because Oracle has to walk through the index to find %smith. It can’t use a range scan because “smith” not starts at first position – just as you pointed out in your reply.

But try to se this statement:

select rowid from allan.papers where upper(title) LIKE '%SMITH%';

The trace output of 4.test is:

Code:

10 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=5000 Bytes=180000)
   1    0

Code:

INDEX (FAST FULL SCAN)

Code:

 OF 'FUNC_UPPER' (NON-UNIQUE) (Cost=11 Card=5000 Bytes=180000)

Statistics
----------------------------------------------------------
       1666  recursive calls
          0  db block gets

Code:

  consistent gets

Code:

  physical reads
          0  redo size
        622  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         32  sorts (memory)
          0  sorts (disk)
         10  rows processed


select rowid from papers where upper(title) LIKE '%SMITH%'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.07       0.07          0         21          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.15       0.27        558        562          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.23

Code:

0.34

Code:

          0          10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 5  (SYSTEM)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  INDEX FAST FULL SCAN FUNC_UPPER (cr=562 r=558 w=0 time=271451 us)(object id 47471)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     10   INDEX (FAST FULL SCAN) OF 'FUNC_UPPER' (NON-UNIQUE)

I knew that Oracle would get RowId from function index because it is fast.

So now I try to use RowID directly on the table:

select * from allan.papers
where rowid in ('AAALltAAUAAAAecABB','AAALltAAUAAAADpAAq','AAALltAAUAAAAH2AAk',
'AAALltAAUAAAALDAAf','AAALltAAUAAAANRAAa','AAALltAAUAAAAPeAAV',
'AAALltAAUAAAASrAAQ','AAALltAAUAAAAW4AAL','AAALltAAUAAAAaFAAG','AAALltAAUAAAAbTAAB');

The trace output of 5.test is:

Code:

10 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=107)
   1    0

Code:

INLIST ITERATOR

Code:

   2    1

Code:

TABLE ACCESS (BY USER ROWID)

Code:

 OF 'PAPERS' (Cost=1 Card=1 Bytes=107)

Statistics
----------------------------------------------------------
       1707  recursive calls
          0  db block gets

Code:

  consistent gets

Code:

  physical reads
          0  redo size
       1059  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         32  sorts (memory)
          0  sorts (disk)
         10  rows processed


select * from papers
where rowid in ('AAALltAAUAAAAecABB','AAALltAAUAAAADpAAq','AAALltAAUAAAAH2AAk',
                'AAALltAAUAAAALDAAf','AAALltAAUAAAANRAAa','AAALltAAUAAAAPeAAV',
                'AAALltAAUAAAASrAAQ','AAALltAAUAAAAW4AAL','AAALltAAUAAAAaFAAG','AAALltAAUAAAAbTAAB')

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.03          0         21          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00         10         10          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03

Code:

0.03

Code:

          0          10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 5  (SYSTEM)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  INLIST ITERATOR  (cr=10 r=10 w=0 time=4761 us)
     10   TABLE ACCESS BY USER ROWID PAPERS (cr=10 r=10 w=0 time=4692 us)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     10   INLIST ITERATOR
     10    TABLE ACCESS   GOAL: ANALYZED (BY USER ROWID) OF 'PAPERS'

So I see that Oracle is very fast to give me the rows I want.

So now I want to combine fast index scan and RowID lookup with final test – 6.test:

select /*+ FIRST_ROWS */ * from papers pap1
where exists (select 1 from papers pap2
where upper(pap2.title) LIKE '%SMITH%'
and pap1.rowid = pap2.rowid );

The trace output of 5.test is:

Code:

 10 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=

Code:

HINT: FIRST_ROWS

Code:

 (Cost=5126 Card=5000 Bytes=715000)
   1    0

Code:

NESTED LOOPS

Code:

 (Cost=5126 Card=5000 Bytes=715000)
   2    1

Code:

SORT (UNIQUE)

Code:

   3    2

Code:

INDEX (FULL SCAN)

Code:

 OF 'FUNC_UPPER' (NON-UNIQUE) (Cost=26 Card=5000 Bytes=180000)
   4    1

Code:

TABLE ACCESS (BY USER ROWID)

Code:

 OF 'PAPERS' (Cost=1 Card=1 Bytes=107)

Statistics
----------------------------------------------------------
       1750  recursive calls
          0  db block gets

Code:

  consistent gets

Code:

  physical reads
          0  redo size
       1059  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
         33  sorts (memory)
          0  sorts (disk)
         10  rows processed



select /*+ FIRST_ROWS */ * from papers pap1
where exists (select 1 from papers pap2
               where upper(pap2.title) LIKE '%SMITH%'
               and   pap1.rowid = pap2.rowid )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.03          0         21          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.12       0.34        567        567          0          10
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.15

Code:

0.37

Code:

          0          10

Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 5  (SYSTEM)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  NESTED LOOPS  (cr=567 r=567 w=0 time=344947 us)
     10   SORT UNIQUE (cr=557 r=557 w=0 time=340508 us)
     10    INDEX FULL SCAN FUNC_UPPER (cr=557 r=557 w=0 time=340322 us)(object id 47471)
     10   TABLE ACCESS BY USER ROWID PAPERS (cr=10 r=10 w=0 time=4355 us)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: HINT: FIRST_ROWS
     10   NESTED LOOPS
     10    SORT (UNIQUE)
     10     INDEX (FULL SCAN) OF 'FUNC_UPPER' (NON-UNIQUE)
     10    TABLE ACCESS   GOAL: ANALYZED (BY USER ROWID) OF 'PAPERS'

So if I guide Oracle with a hint to the optimizer, then the statement is cheaper in resource than full table scan.

However if I would search for a common world like AND (‘%AND%’) then full table is faster, because we probably get a lot of rows.

Conclusion:
So you see depending of how many columns the table got and many rows we query and how often we use the statement and how many users at the same time use the statement, I would consider use full table scan or full index scan with RowID lookup.
If your system is heavy loaded and you need to make best optimal statements to avoid performance degree in the database, then you got the responsibility to write optimal code and help Oracle Optimizer with hints.
Regards
Allan
Icq: 346225948
! If you can't prove it - don't bother try to teach me !

AndyYoung · Mar 24, 2003

Here are a couple of additional notes on doing a caseless search.
You will also likely want to do a "no accents" search for words such as communiqué and resumé (or is it résumé?)

A reasonable solution for a "no accents" function is:

Code:

CONVERT(char, 'US7ASCII', 'WE8ISO8859P1')

Now that you have two functions, you should have a user defined function to standardize the calls to CONVERT and UPPER.

BJCooperIT · Mar 24, 2003

Allan - Thanks so much for the time and energy you spent on my question. Your answer has helped my understanding Function Indexes. Another star for you, well deserved.
(select * from life where brain is not null)
Consultant/Custom Forms & PL/SQL - Oracle 8.1.7 - Windows 2000

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Case insensitive searching 8

ifx

Technical User

BJCooperIT

Programmer

AOLEURO

Programmer

ifx

Technical User

BJCooperIT

Programmer

AOLEURO

Programmer

AndyYoung

Technical User

BJCooperIT

Programmer

Similar threads

Part and Inventory Search

Sponsor