Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Andrzejek on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Processing .csv files

Status
Not open for further replies.

PhilDowns

Programmer
Feb 17, 2011
7
GB
thread277-1601818

Further to the previous thread, I needed to parse several .csv files with a variable number of columns, where the columns may be quoted with embedded commas and or imbedded quotes, typical of user files from Excel.
I ended up with this, I’m sure it could be improved but it does work!

It might help, written for Regina Rexx,

Phil
Code:
/* ************** REXX ************** */
call rxfuncadd 'sysloadfuncs', 'rexxutil', 'sysloadfuncs'
call sysloadfuncs; rc=sysloadfuncs(); rcm=rxfuncerrmsg()
if rc<>0 then do rcc=RxMessageBox('Unable to load REXXUTIL.', 'Rexx Fatal Error:', 'Cancel'); exit 1; end
InFile='c:/DATA/odds.csv'
rc=RegStemRead(InFile,inrec)
if rc <> 0 then do rcc=RxMessageBox('Read Error on' InFile, 'IO Error:', 'Cancel'); exit 1; end
ocnt=0; orec.=''
do r = 1 to inrec.0
	call CSVSplit(inrec.r) 
	if result=0 then do
		say 'Record' r	
		do i=1 to CSVcol.0
			say 'Col('||right(i,3,'0')||')='||CSVcol.i
		end	
	end
end	
exit 0

CSVSplit: Procedure Expose CSVcol.
parse arg CSVRec
inquotes=0 ; CSVcol.='' ;CSVcol.0=0 ; CSVfield=1
if CSVRec='' then do ; return 1 ; end
do CSVchr=1 to length(CSVRec)
	if inquotes=0 then do
		if substr(CSVRec,CSVchr,1)='"' then do
			inquotes=1 ; iterate ; end
		if substr(CSVRec,CSVchr,1)=',' then do
			CSVfield=CSVfield+1 ; iterate ;	end	
		CSVcol.CSVfield=CSVcol.CSVfield||substr(CSVRec,CSVchr,1)	
		iterate ; end
	if substr(CSVRec,CSVchr,2)='""' then do
		CSVcol.CSVfield=CSVcol.CSVfield||'"'
		CSVchr=CSVchr+1 ; iterate ; end
	if substr(CSVRec,CSVchr,2)='",' then do	
		inquotes=0 ; iterate ; end
	CSVcol.CSVfield=CSVcol.CSVfield||substr(CSVRec,CSVchr,1)	
end 
CSVcol.0=CSVfield
return 0
 
It would be helpful to have a sample of the .CSV input and the problem you need to overcome. Having a variable number of fields is should not be a problem

Stan
 
Hi Stan,
This wasn't exactly a request for help, the code I posted actually solved my problem. I thought it might help others trying to process .csv files with "unhelpfull" content.
If you put a heap of quotes,commas, and things like "","",',' into cells in Excel, then save it as a .csv, then try to process it, all the other code I've found just falls in a heap. The code I came up with is able to accuratly unpick Excel's csv in every case I needed.
I expect it's performance or capabilities can be improved, but it's a good start (IMHO)
Phil
 
Phil

The big problem with CSV files is that there is no definitive standard for how they should be coded, more of a general guideline. So what you have may be bulletproof as far as Excel is concerned, but cannot be completely generic for all CSVs.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Good point Steve, not all CSV's are created equal!

So far, I've only had to process CSV's that came from various versions of Excel, so i've been insulated from those that don't use the psudo standard that exists in Excel.

When I do get my hands on another 'Non-Excel' setup I'll see if it can be incorporated or even identified by my procedure!

Phil
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top