Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

1.5GB CSV 4

Status
Not open for further replies.

SQLScholar

Programmer
Aug 21, 2002
2,127
GB
HELP!!!

I am having a small problem here.

I have data given to me in a 1.5gb CSV.

The data contained within i need. It seems as the first row is random characters.

What software can i use to deal with this? I have tried importing into SQL but i get an error basically saying not enough space (i think it may be the per row limit it is breaking). Notepad wont open it, it crashes wordpad.

Any ideas?

Dan

----------------------------------------
There are 2 types of computer, the prototype and the obsolete!!
----------------------------------------
No D, just plank - and its not my fault
 
How would i go about doing that??

Many thanks

Dan

----------------------------------------
There are 2 types of computer, the prototype and the obsolete!!
----------------------------------------
No D, just plank - and its not my fault
 
It seems as the first row is random characters.

Have you managed to open the file and see what's in the first line?

If NotePad is crashing you might try Edlin - the old DOS text editor. It's pretty crude but it can deal with very large files because it doesn't load the entire file at once, it just loads a few lines at a time.

Try opening a Command Prompt and typing EDLIN myfile. Edlin will respond with an asterisk. Type [TT]L1,5[/TT] and Edlin will display the first 5 lines followed by an asterisk on the next line. Type [TT]Q[/TT] to quit without saving.

That should show you whether the file is corrupt.

Geoff Franklin
 
If you have any programming language that has access to low level programming, you could write a simple program to extract the data line by line and byte by byte, throwing away whatever you don't want, and keeping the rest in one or more .txt, .csv, or ascii files that you can access from any program.


mmerlinn

"Political correctness is the BADGE of a COWARD!"

 
Personally I would ask whoever provided it to give me a new file. And if possible I would rather have a delimted text file specifying the type of delimiter you need to best do the import. I would want to agree onthe exact structure of what they were sending beforehand. It is possible that there are too many fields or that the file has everything on one line or some other nonsense. We receive lots of files from clients here and we have a written agreement about the format and timeing of the file befoer we get the first one. Saves no end of import problems becasue if it doesn't meet the agreement, we bounce it back.

Questions about posting. See faq183-874
Click here to help with Hurricane Relief
 
Plank said:
How would i go about doing that??
1. Get a hex editor (Google: "hex editor" = 900,000 hits)
2. Update the data in the first row to look like the rest of the data in terms of number of fields, with a unique identifier in at least one of them.
3. Import the patched file to your DB
4. Delete the invalid record (using the unique identifier from step 2.)

 
You can use Visual Foxpro (or old Foxpro):

From help file: CSV - Include CSV to import data from a comma separated value file. A CSV file has field names as the first line in the file; the field names are ignored when the file is imported.

Use following commands:
CREATE TABLE NewTable (field1 type, field2 type,...)
APPEND FROM CsvFile TYPE CSV

NewTable is then FoxPro table. To append it into your database you can:
- use support for dbf's in your programming language or database
- in Visual Foxpro use following commands:
CREATE DATABASE tmp
CREATE CONNECTION tmp
nHnd = SQLCONNECT( 'tmp' )
SCAN
=SQLEXEC(nHnd,[INSERT INTO FinalTable (fields) VALUES (values)])
ENDSCAN

(CREATE CONNECTION creates ODBC connection (displays a dialog to establish ODBC connection) and SCAN cycles through all records in foxpro NewTable; command in SQLEXEC is any command in syntax of your database)
 
thanks all,

Just to go through everyones answers

1)HEX - my boss is trying that at the moment.
2)Edlin - currently testing seems to work
3)Low level lang - not really got any
4)Get another one - not possible unfortunately. I would favour this also normally.
5)Not got foxpro - but might be an idea.

Stars all round for help - as i recon a combination of some of the above will be useful.

Dan

----------------------------------------
There are 2 types of computer, the prototype and the obsolete!!
----------------------------------------
No D, just plank - and its not my fault
 
If you have access to a Unix/Linux box, the split, dd, top, or awk commands can be used to remove the first record and optionally put it in a separate file from the rest of the records.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Try a program called textpad. Download it from Accordind to the help file, file size is limited to the amount of virtual memory.


It's a pretty good editor. I use it all the time to edit large directory listings to create lists for a import function.
 
Plank:

Most database programs, like FoxPro, and most Basics, plus many many other programming languages, provide low level access commands, most which are very easy to use, usually much easier than the high level commands in the same language.


mmerlinn

"Political correctness is the BADGE of a COWARD!"

 
Import it into a database.

It might be that the whole file is encrypteded and the first part is some kind of key or a header of some kind explaining how to handle the file. It is quite possible that that might be some kind of license code or version information for whatever kind of file it is.

If you do not like my post feel free to point out your opinion or my errors.
 
Import it into a database.

Access can handle larger files than something like Excel if it is imported into something like DBF. Some database can handle about 4 Gig, but sometimes you have to define it as a dynamic file that is expandable. Sometimes a database file can also be resized.

It might be that the whole file is encrypteded and the first part is some kind of key or a header of some kind explaining how to handle the file. It is quite possible that that might be some kind of license code or version information for whatever kind of file it is.

If you do not like my post feel free to point out your opinion or my errors.
 
another 'low level language' that would do the job is any of the Visual Basic family, including VBA.
I like Textpad, too.

And- there are any number of free Database programs, from MySQL to OpenOffice's db module.

cheers...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top