Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Import from file 1

Status
Not open for further replies.

djj55

Programmer
Feb 6, 2006
1,761
US
Hello, SQL 2008 R2
I have a file that has no delimited data. Think of it as a single row of data 6 billion characters wide. Yes it is billion. No CR or LF in the file.

What needs to happen is every 129 characters start a new record.

Can SQL do this or do I just use a .NET program? Which I am currently doing.

Thank you,

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
Are there multiple columns with the 129 characters, or is it essentially 1 column of data?

I'm reasonably certain you can do this with bulk insert and a format file.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
In the text file there is one row with 6 billion plus characters. I want to divide it so that ever 129 character a new row starts in my SQL table which will have one 129 character column [char (129)] by 500K plus rows.


djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
Well.... here you go.

I tested this on a smaller scale (of course). To duplicate this test, create this data file:

Code:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

Save the data above in to a file named data.txt.

Next, create a format file, like this:

Code:
9.0
1
1	SQLCHAR	0	4	""	1	Blah	SQL_LATIN1_GENERAL_CP1_CI_AS

Save this to a file named data.fmt

Then, to continue with the test, create a table to store this data.

Code:
create table djjTest(Data Char(4))

Finally, import the data like this:

Code:
Bulk Insert djjtest
From  'C:\Data\TekTips\Data.txt'
With (FormatFile='C:\Data\TekTips\data.fmt')

Of course, you'll need to change the paths of the data and format file.

Now you can see that the data is actually in the table.

Code:
Select * From djjtest

[tt][blue]
Data
----
abcd
efgh
ijkl
mnop
qrst
uvwx
yzAB
CDEF
GHIJ
KLMN
OPQR
STUV
WXYZ
[/blue][/tt]

To clean up...

Code:
Drop table djjtest

So... all you should have to do is change the [!]4[/!] in the format file to [!]129[/!] and modify the code to use your table, data file, and format file.

While we're at it, you may want to do a little research on optimizing bulk load operations. You see, there are some funky things that happen when you start loading gigabytes of data. Things like transactions and locking can become your bottle-neck. So, do a quick google search on [google]bulk load performance tablock holdlock[/google]

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks!!! I agonize over the bulk insert/bcp format file from time to time and thought it would look for a second line after the specified number of characters.

The reason for wanting to know if T-SQL could import this is the .Net takes a long time so the suggestion of the google search will help. (Yes I know importing that much will take a while. But I would like to compare the speed.)

Thanks again, I will try and let you know how things go.

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
Took around 16 seconds to import 505503 records! Much faster than the several minutes that .NET used.

Have a star and thanks again,

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
Is 16 seconds acceptable?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
djj55,

Out of curiosity, can you post your .NET code here?

Thanks.

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
As I am not a programmer. I used the read -> insert method instead of reading all data in then bulk load the data to SQL. That said here is the core of the code:

Code:
Dim fs As FileStream = File.OpenRead(strReadFile)
Me.Cursor = Cursors.Default

Dim b(128) As Byte
Dim temp As UTF8Encoding = New UTF8Encoding(True)

Do While fs.Read(b, 0, b.Length) > 0
    strSQL = "INSERT INTO MyTable Values('" _
        & temp.GetString(b).Replace("'", "''") & "')"
    cmd.CommandText = strSQL
    cmd.ExecuteNonQuery()
    If (intRows Mod 100) = 0 Then
        txt_RecordCount.Text = intRows.ToString
        Application.DoEvents()
    End If
    intRows += 1
Loop
Notice the code to write to a text box to keep track of how far along the process is, thus slowing down the code.

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
I see. Unfortunately there is no built-in Split functions that splits by a number of characters (at least up to Framework 3.5), so you don' t have a choice but to loop at least once. But you (or whoever wrote the code) can eliminate the loop to read the file like this:
Code:
'1-
Using sr As New StreamReader(fileName)
      'Dim lines As String()
      'lines = sr.ReadToEnd().Split(Chr(3))
      'numRows = UBound(lines)
      Dime lines As String
      lines = sr.ReadToEnd
      sr.Close()
End Using
'2-Write code here to loop through "lines" splitting it by
'129 characters, adding it to a datatable or a list
'3-Then execute ONLY ONE once the code to insert to the DB in order to eliminate round trips to the server. This could be LINQ' s InsertAllOnSubmit or a TableAdapter.Update (if my memory serves me well).
'4- As far as showing progress in the text box, the best way is to use a BackGroundWorger component that reports progress to a progress bar.

Just a suggestion (in the wrong forum).

Good luck.

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
Yep, I have the developers laughing at me, but they do help.[smile]

I have been working on .net programs for about two weeks and have learned a lot. A couple programs actually work well and are being used.

It started with trying to import an Excel file (with SSIS) into SQL as Microsoft is sooo helpful at knowing that I must be mistaken at having zip code as char(5) and really the datatype should be float (among other things).


djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
A good friend of mine blogged about importing data in to SQL Server.


Notice that he did not compare the speed of importing data using another language because that option is likely to be many times slower.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
You can get better performance if you can minimally log your bulk inserts. Several things need to be configured to do this, but when you can, it makes things MUCH faster, especially with large files.




-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George, I now have two more bookmarks.

Normally importing data is not bad but the origin of this post was a stumper. The data comes from the USPS and they have a converter program that takes some time and does not work on a 64 bit system, so I have been using a 32 bit virtual.

The import process exception, for me, is Excel. Things that SQL 2000 would bring in now throw errors so I developed a personal .net program that will import the data. It uses bulk copy. It would help if the clients would always send the same layout.

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top