REGEXP_SUBSTR - detect EOL

Thargy · Aug 28, 2008

I have a problem with regexp, and am looking for suggestions.

The requirement is to examine a string, which is of unknown length, and contains new line characters, which are interspersed with desired data.

Wherever a line starts with [ABODS] it is required to detect the data thereafter, which is characters of text followed by a space and a comma.

For example

Code:

:                                                 
[PANEL] NBS                                                     
[ABODS]            C                                            
[ABCOM]            comment                                      
[CELLS]            1 2 3 4 5 6 7 8 9 10                         
[RESLT] IAT        - - + - - - - - - -                          
[RESLT] ENZYME     - - - - - + - - - +
[ABODS]            Cw, S, k, e

Is a single string, but because it contains new line charactes it displays as above.

From the above, I want the single C which comes after the first [ABODS], and the characters after the second.
Desired output is

Code:

C
Cw, S, k, e

I can detect the lines which start with ABODS, using regular expressions in multi-line mode, but can't figure out how to detect the end of line.

Thus far I have

Code:

SELECT REGEXP_SUBSTR('[ABODS]            C ','^\[ABODS\]')
  FROM DUAL;

as an ABODS detection mechanism, but am struggling to take it further.

Regards

T

SantaMufasa · Aug 28, 2008

John,

Perhaps I'm missing some element of complexity here, but can you not use the function, INSTR(<exp>,chr(10)), to isolate the end-of-line character?

[santa]

Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm. The cost will be your freedoms and your liberty.”

Thargy · Aug 28, 2008

Dave,

yes, that would indeed work, but it wouldn't do what I need.

The first posting (beginning with a colon, and ending with '[ABODS] Cw, S, k, e' is a single field from a table and is stored as a VARCHAR2(4000). In order to make it display nicely on the front end form, new line characters have been included in the text itself (don't even mention data integrity here.....). I want to select each entry from the string which is after an ABODS but before the new line character. Since this is just one long string, instr will find the first chr(10) right enough, but what about the others. The users may enter as many lines containing ABODS as they want, in any order they want, with any number of letters and commas thereafter.

If I can get regexp_substr working in multi-line mode, according to the documentation, this will automatically handle each new line character as if it really did start a separate row, and in one select, do all the donkey work for me.

I am considering doing something clunky like regexp_count to determine the number of ABODS, and then do repeated single regexp_substr's on the line, but this is a poor solution compared with correct use of regexp in all its glory.

Regards

T

Thargy · Sep 2, 2008

I have now completed this task, and it turned out that even in multi-line mode, regexp_substr does not do what I wanted.
I have achieved the desired result by creating a view using conventional INSTR and SUBSTR techniques, and then using the view as the source of a cursor.

The view uses combined selection from dual to number the various multi-line entries

Code:

CREATE OR REPLACE VIEW V_RQS_ANTIBODY
AS
WITH DATA AS
(
SELECT LEVEL L
  FROM DUAL
CONNECT BY LEVEL <= (SELECT MAX((LENGTH(RESULT_TEXT) - LENGTH(REPLACE(RESULT_TEXT,'[ABODS]','')))/LENGTH('[ABODS]'))FROM RESULT)
)
SELECT REQUEST_ID,SAMPLE_DATE_TIME, RESULT_ID,(LENGTH(RESULT_TEXT) - LENGTH(REPLACE(RESULT_TEXT,'[ABODS]','')))/LENGTH('[ABODS]') NUM_OF,
       ((LENGTH(RESULT_TEXT) - LENGTH(REPLACE(RESULT_TEXT,'[ABODS]','')))/LENGTH('[ABODS]') - L) +1 TALLY,
       FLAGS, FUNCTION_CODE,RESULT_TEXT
  FROM REQUEST R INNER JOIN RESULT            USING (REQUEST_ID)
                 INNER JOIN TEST_FORMAT    TF USING (TEST_FORMAT_ID,DEPARTMENT_ID), DATA
 WHERE INSTR (result_text, '[ABODS]') != 0                              --To do with antibodies
   AND (test_format_status_flag = 'r' OR test_format_status_flag = 'p') --Result is authorised
   AND INSTR (function_code, CHR (156) || '5') != 0                     --Result is relevant
   AND (LENGTH(RESULT_TEXT) - LENGTH(REPLACE(RESULT_TEXT,'[ABODS]','')))/LENGTH('[ABODS]') - L >= 0;

This thread should be considered closed.

My thanks to those who tried to help.

Regards

T

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

REGEXP_SUBSTR - detect EOL

Thargy

Technical User

SantaMufasa

Technical User

Thargy

Technical User

Thargy

Technical User

Similar threads

Part and Inventory Search

Sponsor