Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help iporting a .csv file to ORACLE 2

Status
Not open for further replies.

MikeTruscott

Technical User
Jun 26, 2003
35
GB
Hi everyone.....

I basically have a csv file with millions of rows in, which I want to stick into an oracle database. I'm hoping that, if I'm not fussy (see below) it's a one or two line job....but I'm not an ORACLE expert!

The csv file:

1:Has no headings
2:Is comma delimited
3:Has a couple of hundred columns
4:Has, on occasion, fields with commas in like "truscott,mike" for example. So I would want to add a text quantifier of "" if possible,

I believe I can either use SQL*Loader or an external table but don't really have the time to figure these out. So if anone could just tell me what I need to do to get the file into ORACLE I'd really appreciate it.

I'm not fussy as to what the columns are called.....I just want the data into a database. Ideally I'd like ORACLE to create the column headings and would like each column to be a text field, but I don't want to type every field detail in for 200+ fields!

Again, any help much appreciated.

Mike
 
I'd love to be able to offer you a "no work" option where you wave a wand, Harry Potter-style, and the data magically appears in the database. Sadly, Hogwarts doesn't presently have an Oracle section.

In the meantime, SQL*Loader is your best bet. It is not difficult to use. You would need a control file along the lines of:

LOAD DATA
REPLACE
INTO TABLE <name of your table>
FIELDS TERMINATED BY "," optionally enclosed by '"'
TRAILING NULLCOLS
(..list of columns...)

Then just run a command along the lines of:

sqlldr userid=username/password@db control=yourcontrolfile.ctl data=yourdatafile.csv

 
If you're not bothered what the column names are called, you could quickly generate something like this:

select 'column_name_'||to_char(level)||','
from dual
connect by level <= 200

This will give you a list of meaningless column_names, which you can quickly paste into your control file.
 
Dagon....thanks for the quick response!

The table that I import the data to....does it have to exist in the database or will the control file create it for me?

If I've got to generate some cort of set of column names then I'll do it through Excel and paste the text into the control file. They only need to be "Field1" Field2" etc so should be straight forward.

Thanks again.

Mike
 
No, you will have to create the table. But it's simple enough to generate a create table statement from the column list you generated in Excel.
 
Mike,

I, personally, have moved away, entirely, from using SQL*Loader as a separate step for .csv files...I'm using Oracle's EXTERNAL tables (as you alluded in your original post). EXTERNAL tables use SQL*Loader executables (underneather their executions), but it is transparent to you or anyone else once you have established the EXTERNAL tables definition. It means that if you change or replace the underlying .csv file, you do not need to re-run SQL*Loader to access the new data...it is instantly available via your original EXTERNAL table definition.

I'm not going to take the time to construct and post the solution here unless I know you are interested in using it.

Let me know.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Mike,

like Santa, I use external tables to load millions of records - it's just so easy.

I have samples at work, which I can fetch tomorrow, but also like Santa, don't want to trot out the answer unless you're serious.

For starters, get your file exported as pipe delimited instead of comma. Commas pop up so often in text that they're a liability as a delimiter.

Get an electronic list of your field names, (suitably delimited of course) so that we can toss it into a template external table definition.

So, how say you Mike?

Regards

Tharg

Grinding away at things Oracular
 
Mike said:
I'd like ORACLE to create the column headings...for 200+ fields!
Mike, we can use Dagon's method for generating generic column headings, but the fact that there are 200+ columns indicates that your data are (most probably) badly abusing First Normal Form, a rule that says you should not have multiple iterations of the same data items on one row.


Mufasa's First Rule of Well-behaved Data is:
Mufasa's 1st Data Rule said:
Allow "bad data" to live as short a life as possible.
Therefore, to avoid "institutionalising" this structural flaw in your application, as soon as your .csv file is accessible as a table (EXTERNAL or otherwise), re-organise the data into tables and columns that obey First Normal Form. (We can help with that task. If you need help doing "Database Re-Design", please open an separate thread and disclose descriptive column names in the .csv file.)

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Hello.....

Just a quick update....I created the control file by typing the SQL into a .txt file and renaming it. Not sure if this is how you're meant to do it!

Went into DOS and typed in the line DAGON mentioned above and it didn't work....maybe due to spaces in the directory paths. So I moved both the csv and ctl files to the top level of the directory to eliminate this.

Then, in the ctl file I replaced REPLACE with INFILE <csv path>, so that the DOS command became:

sqlldr userid=username/password@db data=<pathtodatafile>

I tried this, and after a second or two it had finished....but

No records went into the database.
A .BAD file was created with about 10 records in. The csv file has over a million.

Any ideas? I'm going to pick it up again this afternoon, or have a go at external tables. External tables only looks like one set of commands....the only thing I'm thinking is that with so many columns I might not be able to create the table in one go......when I created the (destination) table in the database (using SQLplus) I had to create the first half and then append the second half of it using an ALTER statement.

Cheers

Mike
 
There should be a log file with the same name as your control file but with the suffix ".log".
 
...and another thing (!)

In the .bad file it's inserted little rectangles (carriage returns?) in random places. Also, new lines are starting in random places so what should be the first piece of data on a line appears somewhere else.

Not sure if the .bad file is meant to have any structure to it but I thought it was strange.

Mike
 
Mike,

since I've finished a delectable lunch of bacon and egg mayo, I am at peace with the world. Herewith, some sample files which you might care to use.

Run the script below in sqlplus, and make sure that you have sufficient privileges to grant read/write and create a directory.

Code:
CLEAR SCREEN

PROMPT For the server in use, enter the absolute path name of the folder
PROMPT which contains the data files, e.g. F:\ORACLE\DATA_FILES
ACCEPT data_folder_name


CREATE OR REPLACE DIRECTORY EXTERNAL_DATA AS '&data_folder_name';

GRANT READ ON DIRECTORY EXTERNAL_DATA TO <your schema name>;
GRANT WRITE ON DIRECTORY EXTERNAL_DATA TO <your schema name>;
quit

Your database can now "see" the folder called EXTERNAL_DATA, and can use files in that folder as external tables.

Now that you have a working directory, put a data file in it. In order to establish correct working, create a file with just one or two records. That way you can test/debug without waiting ages for millions of records to load.

Then declare the external table, with a control file (as you appear to have already done). Here's a sample, anonymised to protect the guilty.
Code:
CREATE TABLE EXT_MIKETRUSCOTT
(
 MIKETRUSCOTT1        VARCHAR2(100),
 MIKETRUSCOTT2        VARCHAR2(100),
)
ORGANIZATION EXTERNAL 
(
  TYPE oracle_loader
  DEFAULT DIRECTORY EXTERNAL_DATA
    ACCESS PARAMETERS 
   (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY '|'
    REJECT ROWS WITH ALL NULL FIELDS
   )
    LOCATION (MIKES_DATA.TXT')
)
PARALLEL
REJECT LIMIT UNLIMITED;

Put a data file called MIKES_DATA.TXT in the external_data folder, which is pipe delimited (or if you must, use commas and say "FIELDS TERMINATED BY ','" instead of FIELDS TERMINATED BY '|' in the control file. Add entries from MIKETRUSCOTT1 TO MIKETRUSCOTT200 to cater for your massive number of fields. Use Dagon's suggestion to generate the entries, don't do it by hand.

Use the bad file to detect dud data, and start with a small text file and work your way up. You may also have to do some data cleansing as part of this exercise.

Regards

Tharg



Grinding away at things Oracular
 
Tharg,

Thanks for all your help so far!

I finally got the external structure in place (got confused with the PROMPT and ACCEPT bits) but it's not importing anything. All of the file is going straight into the BAD file, but at least it's producing a BAD and LOG file so I know the process is there.

So now I'm trying to figure out what's wrong with the data, as I've specified everything as a VARCHAR2 so thought it would accept anything. I've changed the entries to text and it still doesn't like it. So, my questions are:

If I've specified 100 columns in the external table does the input file have to have 100 columns in? At present it varies between 10 and 100 per line. This is correct....different lines represent different things and we'll use queries to filter them out.

If one row in the file is bad then does it ALL go into the BAD file, or just the one rogue record?

Can I get any info as to why it's rejected a record?

Cheers

Mike
 
yes, the file and the table definition must align, otherwise how can the system interpret the data correctly?

If you have varying numbers of fields, then the data is a real mess, and will be a real pain to sort.

If one record is bad, it doesn't cause the entire thing to bail out. I suspect that because your data is so bad, the system can't make head or tail of it.

Start with one line of text with the correct number of fields. Get it to load, and then move on. You must establish correct working with correct data. Thereafter, if you get an error, you'll know for definite that it's the data and not the system causing the problem. At present you don't know whether it's the data, the system or both.

If you've declared a table with 100 fields, and your data only contains 10, you must add the other 90 delimiters and in the control file say that empty fields are null. That way it should at least load.

Regards

Tharg

Grinding away at things Oracular
 
Balls..... :)

I hoped it would cope with it as all of the fields are of the same data-type. I thought it might populate the first x columns and then stick nulls in the remaining ones.

It wouldn't be a problem if the csv file was small. I could write some vba in a few mins that would add the extra delimiters. Unfortunately, with such a big csv file it won't work. Any other ideas?

If I were to create a second external table could I then use basic SQL to append it's contents to the first external table? It might be that I split the csv file up, correct it, and then stick it back together in the database.

eugh....how depressing. :) At least I have an answer and a way forward now though.

Thanks again for your help. I really appreciate it!

Mike
 
Tharg said:
If you've declared a table with 100 fields, and your data only contains 10, you must add the other 90 delimiters
Mike said:
Any other ideas?
When I have the problem of a varying number of values in the .csv, my solution is to open the file with MS Excel and create a column to the right of the last column. I then drag (copy) some character (such as an "x") down the entire "height" of the table so that each row has a terminating value. This forces each row to be the same number of columns and, as you mentioned earlier, I ensure that the EXTERNAL table definition includes the clause, "missing field values are null". As in my case, this should absolutely, positively resolve your problem with "variable-length rows".



Let us know.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Mufasa that's a great idea!

Not sure if I can use it though as I was told that if I opened the csv file in excel I'd lose some decimal places in my numbers. Some of them are lats and longs so accuracy is quite important.

Of course, the other problem here is that excel can only handle 65000 rows. I'll have a play though, thanks for the suggestion!

Mike
 
Then the other method could be a two-step proposition:

1) Read the .csv flat file using Oracle's "utl_file" flat-file processing infrastructure. Count the delimiters (I/we can help with that function), then concatentate the missing number of delimiters at the end of each row. Write the result back out to a flat file using the write mode of "utl_file".

2) Use EXTERNAL table definition to process the altered .csv file.

Let us know your thoughts on this.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
So would it go something like this? I've cutted and pasted from what examples I've managed to find!

-------------------------------------------------

DECLARE

n_file UTL_FILE.FILE_TYPE;
v_file UTL_FILE.FILE_TYPE;

v_NewLine VARCHAR2(51000) <----POSSIBLE?

BEGIN
v_file := UTL_FILE.FOPEN(location => 'C:\',
filename => 'MY_CSV_FILE.txt',
open_mode => 'r',
max_linesize => 32767);

n_file := UTL_FILE.FOPEN(location => 'C:\',
filename => 'NEW_CSV_FILE.txt',
open_mode => 'w',
max_linesize => 32767);

LOOP

UTL_FILE.GET_LINE(v_file, vNewLine);

<------------------------------>
<insert delimiter function here>
<------------------------------>

UTL_FILE.PUT_LINE(n_file, vNewLine);


END LOOP;

UTL_FILE.FCLOSE(v_file);
UTL_FILE.FCLOSE(n_file);

-----------------------------------------------

I know what I want the delimiter funtion to do but need to look up the sql. Could be quite complicated if it's got to look for commas within quotes but once I find the LENGTH, INSTR etc commands it hopefully won't be too bad.

Don't know what time it is there but it's nearly 1900 here so I'm heading off! I might pop in at the weekend as I'd like to get this done...thanks for helping me make progress, and have a good weekend!

Mike
 
Yes, Mike, you are right on track with your code. The only adjustment that I can see in your code is that Oracle has a 32767-character maximum for varchar2, so "v_NewLine VARCHAR2(51000)" must become "v_NewLine VARCHAR2(32767)".

I'm certain that we could write a user-defined function to cope with the commas that appear inside quotes so that we properly compute the number of fields needing dummy commas.

Let us know when you are ready.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top