Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Importing Large Flat File into SQL Database - File has bad data.

Status
Not open for further replies.

katekis

MIS
Feb 12, 2009
66
US
I am fairly new to writing SSIS Packages in Microsoft Visual Studio. I am a DBA using SQL Server 2005.

We receive a flat file each monthly that has over billion records in the file. I is a dump of all our transaction history throughout the month from one of our vendors.

The file is ugly; it has over 500 columns along with the billion records.

Is there a program or something I can do in Visual Studio to break the file into smaller files (batches)?

Is there a program or something I can do in visual studio to cleanup the file before I try to import the file into our database?

The flat file is delimited by | (a pipe).

Throughout the file there are records that say "Transaction Failed" and it immediately does a carriage return then there are 2 rows of invalid data with the |not lining up with the column headers. I would like to somehow delete this junk before I try to import it into our database. I have tried to import to a temp table and then delete these bad records but that seems to not work either. The bad records have invalid characters like boxes and upside down U. If I manually break the file into smaller files and delete the bad records I can usually get it to import just fine but that take hours of manual labor.

Any suggestions?
 
I have parsed some large text files in the past (but not that big) using VB.

You open the file and parse it line by line copying each line out to a new file once it passes any test you set up. You open up a new out file every so many records.

It is fairly easy to do but will take some time and a pretty good understanding of the problems the file might encounter.

It will also require twice the disk space.

Hope that helps.

Simi

 
Just out of curosity how big is that file?

500 columns along with the billion records delimited by | (a pipe) would be 5 gig with no data.

Simi
 
The file is normally about 1,000,000 KB.

The actual number of columns is 150 (sorry I exaggerated a little).

The file is soo big I cannot open the whole file with notepad or excel and I takes a very long time for me to open it with wordpad.

Can you point me to where I can get more information on parsing a file?
 
If you have visual studio, particularly a newer .NET version, you could easily read the file line by line and write a new file excluding the ones you don't want.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top