Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Oracle PL/SQL

Status
Not open for further replies.

mpls51

Programmer
Nov 4, 2007
6
US
I have the following table in oracle 9i:

arc_book
----------------------------------------------------------------------------------------------
arc_id, arc_title, arc_publisher, arc_release_date

And a tab delimited file called smc_book. Here is how the smc_book tab delimited file looks like:

smc_id smc_title smc_publisher smc_release_date
---------------------------------------------------------------------------------------------------------------------------------------------
1234 "Beautiful Wonder" "Wrox Books" 1/1/1999
2356 "Master PL/SQL" "OReilly Media" 6/5/2004
5432 "Harry Potter and Goblet of Fire" "Simon & Shuster" 2/4/2001


What I want to do is create a lookup table where I matchup the records that are the same book in the smc_book file and arc_book table.
So the lookup table would look like:

lookup
-------------------------------------------------
smc_id, arc_id, arc_title, arc_publisher


When I read in the smc_book file I don't want to re-add books that have already been added to the lookup table.
Also when I am matching up the books in the smc_book file and arc table I want to make sure it gets all similar book titles that have the same publisher and release date because the book title, publisher are not exactly the same in both table. For example with data:

smc_book file:
smc_id smc_title smc_publisher smc_release_date
---------------------------------------------------------------------------------------------------------------------------------------------
1234 "Beautiful Wonder" "Wrox Books" 1/1/1999
2356 "Master PL/SQL" "OReilly Media" 6/5/2004
5432 "Harry Potter and Goblet of Fire" "Simon & Shuster" 2/4/2001


arc_book table:
arc_id arc_title arc_publisher arc_release_date
---------------------------------------------------------------------------------------------------------------------------------------------
1245 "Wonder, Beautiful" "Wrox" 1/1/1999
1244 "The PL-SQL, Master" "Media, OReilly" 6/5/2004
4352 "Golbet of Fire, Harry Potter" "Simon and Shuster" 2/4/2001

So I want to match up "Beautiful Wonder" in the smc_book file with "Wonder, Beautiful" in the arc_book table even though the title and publisher data is not exact.

Could someone please give an example of reading a tab delimted file in oracle and also how to match up "Beautiful Wonder" in the smc_book file with "Wonder, Beautiful" in the arc_book table even though the title and publisher data is not exact.

Thanks
 
mpls51,

This is really two or three questions for the price of one, so let's start by dividing (so we may then conquer).

First of all, to use either SQL or PL/SQL to run queries, the data must be in tables accessible by Oracle. That means you must first load the data from the file into an Oracle table. Queries may only be used once this is completed.

To load your data I would suggest either a sqlplus text file full of insert statements, and processed by sqlplus, or the use of sql loader, or (my preference) oracle external tables.
Choice of loading tool is influenced by data volume. If you only have a few thousand rows, sqlplus will do fine. If you have millions of rows then go for loader or external tables.

Once you've cracked this, let us know, and we can take on your comparison queries. Please note that there is a recent thread thread759-1420823 which goes into details about how to use external tables.

I must say I appreciate one of the more clearly stated questions I've seen in a while. Adopting such an approach as yours, i.e. for such and such data, I expect such and such an output is the foundation of unit testing, and sql needs it, just as much as PL/SQL. For what it's worth, well done for taking the time to formulate your question clearly - many don't.

When time permits, I'll feign knowledge, and concoct some of your queries, but I expect one SantaMufasa to give us both a master class, when he wakes up in a few hours from now.

Regards

Tharg

Grinding away at things Oracular
 
Thanks for the informative reply. I think i'll go with external tables. After reading the link you provided and cutting and pasting heres what I came up with.

CREATE TABLE EXT_SMC_BOOK
(
arc_id number(5),
arc_title varchar2(80),
arc_publisher varchar2(80),
arc_release_date date
)
ORGANIZATION EXTERNAL
(
TYPE oracle_loader
DEFAULT DIRECTORY EXTERNAL_DATA
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '\t'
REJECT ROWS WITH ALL NULL FIELDS
)
LOCATION (smc_book.txt')
)
PARALLEL
REJECT LIMIT UNLIMITED;

thanks again
 
Thanks for the informative reply. I think i'll go with external tables. After reading the link you provided and cutting and pasting heres what I came up with.

CREATE TABLE EXT_SMC_BOOK
(
smc_id number(5),
smc_title varchar2(80),
smc_publisher varchar2(80),
smc_release_date date
)
ORGANIZATION EXTERNAL
(
TYPE oracle_loader
DEFAULT DIRECTORY EXTERNAL_DATA
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '\t'
REJECT ROWS WITH ALL NULL FIELDS
)
LOCATION (smc_book.txt')
)
PARALLEL
REJECT LIMIT UNLIMITED;

thanks again
 
mpls,

don't forget to run the setup script to declare the external folder to Oracle, otherwise, your script (which appears to be correct) will fail.

T

Grinding away at things Oracular
 
Thanks for your help. Now I just need to match up "Beautiful Wonder" in the smc_book file with "Wonder, Beautiful" in the arc_book table even though the title and publisher data is not exact.
 
mpl,

Forward troops, let phase 2 begin!

Ok, so you've got your external table working, right?
To enable us to help, create and post the following code.

Create table statements for the original table, followed by explicit insert statements for the sample data you originally posted.

Then a create table statement (as an ordinary internal) and insert statements from your external table. This will save us having to create our own external table. We can cheat and use an exact internal copy of your external.

Then, (if your original post doesn't say it precisely) an exact statement of which rows should match and what data should be returned, by querying against these tables.

This may sound like make-work, but by doing this, you generate your own unit test script, which enables testing in a trice, ok?

Have a bash at doing this, and post again if you need assistance. I'm deliberately not helping much here, as you seem knowledgable enough to DIY.

Regards

T

P.S. I'm off to bed soon, so if you're in the USA don't think I've abandoned you, just wait a few hours :) SantaMufasa normally does the U.K. graveyard shift, so if insomnia strikes, and you work into the wee hours, plead for his help.

Grinding away at things Oracular
 
Here is the sql statements
smc_book table:

create table smc_book
(smc_id number(5),
smc_title varchar2(80),
smc_publisher varchar2(80),
smc_release_date date);

insert into smc_book
values (1234, 'Beautiful Wonder', 'Wrox Books', 'Jan-01-1999');
insert into smc_book
values (2356, 'Master PL/SQL', 'OReilly Media' 'Jun-06-2004');
insert into smc_book
values (5432, 'Harry Potter and The Goblet of Fire', 'Simon & Shuster', 'Feb-04-2001');

arc_book table:

create table arc_book
(arc_id number(5),
arc_title varchar2(80),
arc_publisher varchar2(80),
arc_release_date date);

insert into arc_book
values (1245, 'Wonder, Beautiful', 'Wrox' 'Jan-01-1999');
insert into arc_book
values (1244, 'The PL-SQL, Master', 'Media, OReilly' 'Jun-06-2004');
insert into arc_book
values (4352, 'Golbet of Fire, Harry Potter', 'Simon and Shuster' 'Feb-04-2001');

lookup table:
create table lookup
(smc_id number(5),
arc_id number(5),
arc_title varchar2(80),
arc_publisher varchar2(80));

Here is what should match up
smc_id arc_id
---------------
1234 1245
2356 1244
5432 4352
 
MPLS,

I'm sorry that I'm coming into this thread rather late.

First of all, the code you supplied (to seed the tables) had several syntax errors in the INSERT statement, but here are the corrections:
Code:
set define off
insert into smc_book 
values (1234, 'Beautiful Wonder', 'Wrox Books', to_date('Jan-01-1999','Mon-dd-yyyy'));
insert into smc_book
values (2356, 'Master PL/SQL', 'OReilly Media',to_date('Jun-06-2004','Mon-dd-yyyy'));
insert into smc_book
values (5432, 'Harry Potter and The Goblet of Fire','Simon & Shuster',to_date('Feb-04-2001','Mon-dd-yyyy'));

insert into arc_book 
values (1245, 'Wonder, Beautiful', 'Wrox',to_date('Jan-01-1999','Mon-dd-yyyy'));
insert into arc_book
values (1244, 'The PL-SQL, Master', 'Media, OReilly',to_date('Jun-06-2004','Mon-dd-yyyy'));
insert into arc_book
values (4352, 'Golbet of Fire, Harry Potter', 'Simon and Shuster',to_date('Feb-04-2001','Mon-dd-yyyy'));
I don't have the time presently to build code for your matching, but I can describe what, I believe, are the pieces of the algorithm that you will need:[ul]From one table or the other:

[li]Remove all punctuation[/li][li]Remove all less-significant words (e.g., " the ", " a ", " an ")[/li][li]Separate out/Parse all words[/li]...Then compare the resulting individual words to the row contents of the other table.[/ul]This coding activity should take one with PL/SQL coding confidence/experience perhaps .5 - 1 hour of coding/testing.


Let us know if you feel confident undertaking this activity. If not, perhaps one of us could do it for you when we have that time.

Advise us.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
...and, your data shows "Golbet of Fire" and "Goblet of Fire"...is "Golbet an intentional misspelling that you expect to match to "Goblet? If so, then the algorithm just got tons more complicated.

"Wrox" and "Wrox Books"...That, I expect is a match, so you then must add to the algorithm a list of not-necessary-to-match words.

Also, I forgot to mention that your algorithm should replace "&" to "and".

Perhaps it would be appropriate to mention at this juncture, Mufasa's First Rule of Quality Data:
Mufasa said:
Do not allow "bad data" to live a long life.
In your case, if you consider two expressions with physically non-matching values to match logically (i.e., a "fuzzy match"), that, IMHO, is bad/low-quality data. Resultingly, rather than require special programming to cause non-matching data to match indefinitely, you should have a process that causes one value or the other to change to match its counterpart.


I recobnise that, in this case, you may have no control over the sources of the two sets of data, but you do (i.e., should) have control of what the data looks like once it becomes part of your system.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Sorry about the syntax errors. I'm a novice when it comes to PL/SQL but I'll try to take a crack at writing the algorithm but could you provide some resources on how to remove punctuation and separating out/parsing all words. As far as "Golbet" I unintentionally misspelled it should be "Goblet".

I'm also having problems with creating my external table.
When I run the following code:

CLEAR SCREEN

PROMPT For the server in use, enter the absolute path name of the folder
PROMPT which contains the data files, e.g. F:\ORACLE\DATA_FILES
ACCEPT data_folder_name


CREATE OR REPLACE DIRECTORY EXTERNAL_DATA AS '&data_folder_name';

GRANT READ ON DIRECTORY EXTERNAL_DATA TO <your schema name>;
GRANT WRITE ON DIRECTORY EXTERNAL_DATA TO <your schema name>;
quit

In the create directory section of the above code it says that the directory was created successfully but it does not appear so I created it and put the data files in it. Since I'm the one running the control file I can't GRANT/REVOKE privileges to/from myself. When I run the control file I get the following errors.

ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04063: unable to open log file EXT_SMC_BOOK_16394.log
OS error Permission denied
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1
 
mpls,

I had envisaged that you have DBA privileges. These are needed to create directorys 'visible' to Oracle. You would then grant read/write on that directory to the user which you intend using to do the job.

Note that you should normally never operate with DBA privileges.

So, after you've logged on as sys (or some other suitably authorised account) and run the create directory script, log out, and then log in as the ordinary user, to whom you just granted read/write privileges.

Please post the result of running
Code:
select * from dba_directories

You should get something like:-
Code:
OWNER DIRECTORY_NAME DIRECTORY_PATH
SYS   MEDIA_DIR      E:\oracle\ora92\demo\SCHEMA\product_media\
SYS   LOG_FILE_DIR   E:\oracle\ora92\demo\SCHEMA\LOG\
SYS   DATA_FILE_DIR  E:\oracle\ora92\demo\SCHEMA\sales_history\ 
SYS   DEPT_SITE_LOAD E:\oracle\BULK_LOAD\DEPT_SITE\EXTERNAL_DATA

Note that last line, it's my EXTERNAL_DATA directory. The other three are examples created by the DBCA, if you chose (as I did) to include sample schemas when you made the db).

The troops are waiting the order to advance....

Regards

Tharg

Grinding away at things Oracular
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top