Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need Help with formatting .csv 1

Status
Not open for further replies.

zmann

Technical User
Jul 15, 2003
60
US
I am having an issue formatting a .csv file. I wrote a script to create User accounts, the input used for the script is a .csv file exported from Remedy, when the file is exported it is not in the proper format, so I have written the scripting neccesary to take care of that. I am able to accomplish all the formatting to the file that I want to do, but there are blank lines at the end of the file that are throwing things off a bit. I am basically looking for a way to make sure that there are no Carriage returns or blank lines after the last line of valid text. The script I have runs very well except for this one issue. I have exhausted all of my normal research venues, but to no avail.

Thanks,

Z
 
use TRIM to get rid of whitespace and use LEN to check the length...if LEN(arrayName)<=0 then move to the next line or whatever

Bastien

Any one have a techie job in Toronto, I need to work...being laid off sucks!
 
Bastien,

Thanks for the tips on TRIM and LEN, but this is not exactly what I am after. I need to be able to remove blank lines at the end of the .csv file, I don't have an issue with spacing in the lines of the .csv file that contain texts. Looking for other suggestions.

Thanks,

Z
 
are you parsing your csv line at a time?

if so simply check to see if any text in line - is there is carry on processing as you are doing now, if not goto next line!



Jamie Gillespie
j-gillespie@s-cheshire.ac.uk
 
JGillespie,

Yes I am reading in the csv file a line at a time then using the split function to create an array that the script can use. I don't have a problem with blank lines throughout the script. It is one blank line at the very end of the .csv file that I am trying to get rid of. For some reason this blank line causes the Do While Loop to rerun the last line of text in the file. I know this is causing this because If I open and remove the blank line at the end manually the script runs great with no issues. But if there is a blank line at the end of the .csv it causes it to rerun the last vaild line of text. I don't really understand why it is doing that, if any one knows I would love to hear it.

Thanks though,

Z
 
post the script and we can try and help!

Jamie Gillespie
j-gillespie@s-cheshire.ac.uk
 
I really can't post the entire script as it is over 600 lines and it is run on a DOD network, too much infrastructure specific things in it. But here is the part of the script I use for formatting the .csv.

Code:
	Set objFSO = CreateObject(&quot;Scripting.FileSystemObject&quot;)
	Set TF = objFSO.OpenTextFile(strInputFilePath,1)
	inp = TF.ReadAll
	out1=Replace(inp,Chr(34),&quot;&quot;)
	out2=Replace(out1,&quot;,&quot;,&quot;;&quot;)
	out3=Replace(out2,&quot;Albuquerque; NM&quot;,&quot;Albuquerque, NM&quot;)
	out4=Replace(out3,&quot;Anniston; AL&quot;,&quot;Anniston, AL&quot;)
	out5=Replace(out4,&quot;Arden Hills; MN&quot;,&quot;Arden Hills, MN&quot;)
	out6=Replace(out5,&quot;Arlington Heights; IL&quot;,&quot;Arlington Heights, IL&quot;)
	out7=Replace(out6,&quot;Bamberg; Germany&quot;,&quot;Bamberg, Germany&quot;)
	out8=Replace(out7,&quot;Baumholder; GERMANY&quot;,&quot;Baumholder, GERMANY&quot;)
	out9=Replace(out8,&quot;Beaumont; TX&quot;,&quot;Beaumont, TX&quot;)
	out10=Replace(out9,&quot;Camp Carroll; KOREA&quot;,&quot;Camp Carroll, KOREA&quot;)
	out11=Replace(out10,&quot;Camp Casey; KOREA&quot;,&quot;Camp Casey, KOREA&quot;)
	out12=Replace(out11,&quot;Camp Henry; KOREA&quot;,&quot;Camp Henry, KOREA&quot;)
	out13=Replace(out12,&quot;Camp Hovey; KOREA&quot;,&quot;Camp Hovey, KOREA&quot;)
	out14=Replace(out13,&quot;Camp Parks; CA&quot;,&quot;Camp Parks, CA&quot;)
	out15=Replace(out14,&quot;Camp Zama; JAPAN&quot;,&quot;Camp Zama, JAPAN&quot;)
	out16=Replace(out15,&quot;Chievres AB; Belgium&quot;,&quot;Chievres AB, Belgium&quot;)
	out17=Replace(out16,&quot;Columbus; OH&quot;,&quot;Columbus, OH&quot;)
	out18=Replace(out17,&quot;Concord; NC&quot;,&quot;Concord, NC&quot;)
	out19=Replace(out18,&quot;Coraoplis; (Pittsburgh); PA&quot;,&quot;Coraoplis, (Pittsburgh), PA&quot;)
	out20=Replace(out19,&quot;Decatur; GA&quot;,&quot;Decatur, GA&quot;)
	out21=Replace(out20,&quot;Des Moines; IA&quot;,&quot;Des Moines, IA&quot;)
	out22=Replace(out21,&quot;Fort Myer; VA&quot;,&quot;Fort Myer, VA&quot;)
	out23=Replace(out22,&quot;Hanau; GERMANY&quot;,&quot;Hanau, GERMANY&quot;)
	out24=Replace(out23,&quot;Harrisburg; PA&quot;,&quot;Harrisburg, PA&quot;)
	out25=Replace(out24,&quot;Independence; MO&quot;,&quot;Independence, MO&quot;)
	out26=Replace(out25,&quot;Jacksonville; FL&quot;,&quot;Jacksonville, FL&quot;)
	out27=Replace(out26,&quot;Kaizerslautern; Germany&quot;,&quot;Kaizerslautern, Germany&quot;)
	out28=Replace(out27,&quot;Kings Mills; OH&quot;,&quot;Kings Mills, OH&quot;)
	out29=Replace(out28,&quot;Lexington; KY&quot;,&quot;Lexington, KY&quot;)
	out30=Replace(out29,&quot;Little Rock; AR&quot;,&quot;Little Rock, AR&quot;)
	out31=Replace(out30,&quot;Los Angeles; CA&quot;,&quot;Los Angeles, CA&quot;)
	out32=Replace(out31,&quot;Louisville; KY&quot;,&quot;Louisville, KY&quot;)
	out33=Replace(out32,&quot;Mannheim; GERMANY&quot;,&quot;Mannheim, GERMANY&quot;)
	out34=Replace(out33,&quot;Millington; TN&quot;,&quot;Millington, TN&quot;)
	out35=Replace(out34,&quot;Milwaukee; WI&quot;,&quot;Milwaukee, WI&quot;)
	out36=Replace(out35,&quot;Mobile; AL&quot;,&quot;Mobile, AL&quot;)
	out37=Replace(out36,&quot;Monterey; CA&quot;,&quot;Monterey, CA&quot;)
	out38=Replace(out37,&quot;Montgomery; AL&quot;,&quot;Montgomery, AL&quot;)
	out39=Replace(out38,&quot;Nashville; TN&quot;,&quot;Nashville, TN&quot;)
	out40=Replace(out39,&quot;North Charleston; SC&quot;,&quot;North Charleston, SC&quot;)
	out41=Replace(out40,&quot;Oakdale; (Pittsburg); PA&quot;,&quot;Oakdale, (Pittsburg), PA&quot;)
	out42=Replace(out41,&quot;Perrine (WestMiami); FL&quot;,&quot;Perrine (WestMiami), FL&quot;)
	out43=Replace(out42,&quot;Schweinfurt; Germany&quot;,&quot;Schweinfurt, Germany&quot;)
	out44=Replace(out43,&quot;Seagoville; TX&quot;,&quot;Seagoville, TX&quot;)
	out45=Replace(out44,&quot;Sharonville; OH&quot;,&quot;Sharonville, OH&quot;)
	out46=Replace(out45,&quot;Shreveport;(Bossier) LA&quot;,&quot;Shreveport,(Bossier) LA&quot;)
	out47=Replace(out46,&quot;Southfield; Mi.&quot;,&quot;Southfield, Mi.&quot;)
	out48=Replace(out47,&quot;St. Petersburg; FL&quot;,&quot;St. Petersburg, FL&quot;)
	out49=Replace(out48,&quot;Tampa; FL&quot;,&quot;Tampa, FL&quot;)
	out50=Replace(out49,&quot;Torri Station; JAPAN&quot;,&quot;Torri Station, JAPAN&quot;)
	out51=Replace(out50,&quot;Vicenza; Italy&quot;,&quot;Vicenza, Italy&quot;)
	out52=Replace(out51,&quot;Vilseck; GERMANY&quot;,&quot;Vilseck, GERMANY&quot;)
	out53=Replace(out52,&quot;Wiesbaden; Germany&quot;,&quot;Wiesbaden, Germany&quot;)
	out54=Replace(out53,&quot;Willow Grove(Phil); PA&quot;,&quot;Willow Grove(Phil), PA&quot;)
	out55=Replace(out54,&quot;Wuerzburg; GERMANY&quot;,&quot;Wuerzburg, GERMANY&quot;)
	TF.close
	Set TF = ObjFSO.CreateTextFile(strInputFilePath,True)
	TF.Writeline out55
	TF.Close

Thanks for the help,

Z
 
you say you are parsing the file line at a time, but from what you have posted here you are clearly reading the entrire file into a variable?



Jamie Gillespie
j-gillespie@s-cheshire.ac.uk
 
I am reading the file in a line at a time, the code above is only for formatting the file for use. Here is the snippet that actually reads the file into the script for use.

Code:
Const ForReading = 1

Set objFSO = CreateObject(&quot;scripting.filesystemobject&quot;)
Set objTF = objFSO.OpenTextFile(inputfile,ForReading,True)
Do While objTF.AtEndOfStream <> True
sLine = objTF.ReadLine

'// Split the line to create an array //

aLine = split(sLine, &quot;;&quot;,-1,1)

'// The CSV file has 6 columns like this //
'// username, firstname, lastname, expirationDate, Ticket #, Location //
'// This can easily be added to or truncated to fit the need //

sCN = aLine(0)
FirstName = aLine(1)
LastName = aLine(2)
eDate = aLine(3)
sLocation = aLine(5)

Thanks,

Z
 
OK

so when you parse the file line by line you are NOT checking line contents BEFORE you apply the split function.

Do you by any chance have an On Error Resume Next BEFORE this point in the script - if so your variable allocations from the aline array will still be holding the previous lines values.

either set aline = &quot;&quot; after each line has been processed or check for text on the line before split function.

Jamie Gillespie
j-gillespie@s-cheshire.ac.uk
 
JGillespie,

Thanks for the response. Yes I do have an On Error Resume Next right before this snip, I use some error handling stuff in the Do While Loop. I'm not sure I follow what you mean by setting the aLine = &quot;&quot;. Do you mean to set the variables like aLine = nothing? I know it is best practice to set your variables to nothing after they are done being used to prevent buffer overflows and such. As far as checking to see if there is text on the line Isn't that what the Do While objTF.AtEndofStream <> True line is for. So if I am following your thought process here, The Do While line is going to read in a line whether it is blank or has text, and the way to prevent it from reading blank lines is to release my variables at the end of the loop?
Sorry, just trying to wrap my head around this.

Thanks for the help,

Z
 
What is the original problem?

It almost looks like you have an input file that is comma delimited and has quotes around string (or text) values, and you want to strip the quotes.

Then you have the problem of the empty record/line at the end. Is the line truly empty, does it have blanks, or what?

It looks like you have things like:

&quot;City, ST&quot;

And you want them to end up as:

City, ST

These would be two fields in your new output file instead of one as in the input - or is it your intent that the output file use &quot;;&quot; as the field delimiter while keeping the &quot;,&quot; within the city/state field?

Something like:

Code:
&quot;BlowJ&quot;,&quot;Joe&quot;,&quot;Blow&quot;,11/29/2004,&quot;City, ST&quot;

Becomes:

Code:
BlowJ;Joe;Blow;11/29/2004;City, ST

???


Where I'm going is that there may be an easier way to accomplish what you are after than doing all of those Split( ) and Replace( ) operations.
 
After this line
aLine = split(sLine, &quot;;&quot;,-1,1)
you may consider to test UBound(aLine)

Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884
 
dilletante and PHV,

Thanks for your responses. dilletante, you are absolutely correct in your description, basically it has all the elements you mention and yes I am retaining the formatting of the site by using a &quot;;&quot; as the delimeter.

The original problem is that the .csv file has a blank line or lines at the end of the file, and it causes the last valid entry in the file to be run twice.

PHV, I am not that familiar with the Ubound function, could you elaborate.

dilletante, if you have a better way than using the split and replace routine I am using, I am always open to learn better ways to accomplish things.

Thanks,

Z
 
Something like this:
If UBound(aLine) = 5 Then ' Check for 6 columns (0..5)
...
End If


Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884
 
Sorry, I got incredibly busy today.

I'll assume your original and new logfiles might be almost anyplace. So I'll start by copying the original file into a work directory. Next I'll reformat the file into a new file with the changes in this work directory. Then I'll copy the new file out to where it needs to end up. Finally I'll delete the temporary files.

If you don't need this &quot;Curly Shuffle&quot; you can just omit it.

I start by creating the following directory structure:
Code:
Reformat (dir)
   |
   +--Reformat.vbs     {script to do the work}
   |
   +--Data (dir)
         |
         +--Schema.ini {Jet Text Driver control file}
         |
         +--log.txt    {a sample input log}
         |
         +--orig.csv   {where I copy the input file,
         |              gets deleted}
         +--new.csv    {reformatted data file,
         |              gets deleted}
         +--newlog.txt {final reformatted file}
Both log.txt and newlog.txt might be anywhere, or called anything. There are Const assignments in the script to set these.

The files orig.csv and new.csv have to be called this. These names are hard-coded both in Reformat.vbs and Schema.ini. They do not need to be in a subdirectory of the Reformat directory, and it does not need to be called Data. But these temporary files and Schema.ini need to be in some common directory somewhere, and the path needs to be defined in a Const in the script.

The layout and format of these files is spelled out in this Jet Text Driver control file:

Schema.ini
Code:
[orig.csv]
ColNameHeader=False
Format=CSVDelimited
TextDelimiter=&quot;
MaxScanRows=1
Col1=UserName    Char
Col2=Firstname   Char
Col3=LastName    Char
Col4=ExpDate     Char
Col5=TicketNum   Char
Col6=Location    Char
[new.csv]
ColNameHeader=False
Format=Delimited(;)
TextDelimiter=none
MaxScanRows=1
Col1=UserName    Char
Col2=Firstname   Char
Col3=LastName    Char
Col4=ExpDate     Char
Col5=TicketNum   Char
Col6=Location    Char
This file describes all of the details Jet will need to process the files. Note that new.csv is described as being &quot;;&quot; delimited and that text fields have no delimiters at all (such as the &quot; symbol).

I also made a test input file, though it is tiny. It has two good records, and one empty one. I use $ here to show where the CRLFs are:

log.txt
Code:
&quot;JohnsonR&quot;,&quot;Richard&quot;,&quot;Johnson&quot;,&quot;12/30/2004&quot;,&quot;03347&quot;,&quot;Little Rock, AR&quot;$
&quot;WilliamsK&quot;,&quot;Kurt&quot;,&quot;Williams&quot;,&quot;11/23/2004&quot;,&quot;03346&quot;,&quot;Indianapolis, IN&quot;$
$
The script itself:

Reformat.vbs
Code:
Option Explicit

'Working directory, relative to script or absolute,
'where schema.ini is and work files go.
Const WorkPath = &quot;Data&quot;
'Original file, relative to script or absolute.
Const OrigFile = &quot;Data\log.txt&quot;
'New file, relative to script or absolute.
Const NewFile = &quot;Data\newlog.txt&quot;

'ADO constants.
Const adExecuteNoRecords = &amp;H80
Const adCmdText = &amp;H1

Dim objFSO, strSQL, objCon

Set objFSO = CreateObject(&quot;Scripting.FileSystemObject&quot;)
objFSO.CopyFile OrigFile, WorkPath &amp; &quot;\orig.csv&quot;, True
Set objCon = CreateObject(&quot;ADODB.Connection&quot;)

strSQL = &quot;SELECT * INTO new.csv&quot; _
       &amp; &quot; FROM orig.csv WHERE UserName IS NOT NULL&quot; _
       &amp; &quot;  ORDER BY TicketNum;&quot;

objCon.Open  &quot;Provider=Microsoft.Jet.OLEDB.4.0;&quot; _
           &amp; &quot;Extended Properties='Text';&quot; _
           &amp; &quot;Data Source='&quot; &amp; WorkPath &amp; &quot;';&quot;

objCon.Execute strSQL, , adExecuteNoRecords Or adCmdText
objCon.Close
Set objCon = Nothing

objFSO.CopyFile WorkPath &amp; &quot;\new.csv&quot;, NewFile, True
objFSO.DeleteFile WorkPath &amp; &quot;\new.csv&quot;, True
objFSO.DeleteFile WorkPath &amp; &quot;\orig.csv&quot;, True
Set objFSO = Nothing
MsgBox &quot;Done&quot;
The trick here is in the SQL statement, where I only include records where UserName isn't null. I threw in a sort as well, but you can leave it out or change it. I also processed the dates as strings (see Schema.ini) because I assume they are quoted too.

Final results?

newlog.txt
Code:
WilliamsK;Kurt;Williams;11/23/2004;03346;Indianapolis, IN$
JohnsonR;Richard;Johnson;12/30/2004;03347;Little Rock, AR$
As before, I use $ to represent a CRLF. Notice there isn't any blank (empty) line in the new file.

This can be radically simplified if you can have your input and output and the Jet control file all in some existing directory. Then you can:
[ul][li]Strip out the FSO and related operations,[/li]
[li]Put Schema.ini in that work directory.[/li]
[li]Correct the names to your names in Schema.ini and the script's SQL statement, and[/li]
[li]Update the constant WorkPath to point to your work directory.[/li][/ul]
Do not forget that you can't use filenames like &quot;input.csv&quot; or &quot;output.csv&quot; or DOS device names like &quot;con&quot; and so on. Seems to be a security thing.

This ought to be a lot speedier than mucking about with scripted file I/O, Split( ) and Replace( ) calls, etc.

As far as I know it can handle pretty large files.
 
WOW!!! Man, I am blown away! Awesome Script!

I never thought of using the Jet OLEDB in this way. I already tested the files you have in your example and they work great. Tomorrow I am going to take this puppy in the lab and take it for a spin on about 1000 users.

dilettante, I thank you so much for taking the time to put all this down, I know it took a bit. Thank You. Defiantely a star for you, wish I could give you more. Every one that reads this post should give you a star, come on everyone, you got to admit, THIS THING ROCKS!!!

Thank you so much,

Z
 
Thanks for the kind words zmann.

It isn't something you want for every occasion, but where it does fit I think it's a great little tool for the scripter's bag of tricks. I've been blown away myself by how many littles tasks I've been able to apply Jet and ADO to lately. There is a lot more to learn, I've barely scratched the surface.

If I can find a good set of stable &quot;Text Driver&quot; links I'd love to post them in a FAQ or something. Good information is hard to come by. Here are some I have found:

schema.ini

text file format details

fundamental Jet SQL (links at end of article to intermediate and advanced too)

registry key governing text file extensions
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top