Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse CSV file with quotes (sometimes) 2

Status
Not open for further replies.

Bigsin

Programmer
Jan 17, 2009
82
NL
Hi,

I know there is a lot of information about parsing csv files on the internet, but i did not find a usable solution for my problem.

I have a CSV files with quotes. The qoutes are not always on fixed places.

for example
Name,Department,Phone number,
"Smith, John",Sales,555-123456,
Brown,"Sales, UK",+4412345678,
Johnson,sales,555-432156,

If i'm using the split method using ",", it will be a mess in my datagrid.

Any ideas to solve my problem?

A part of my code

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim fName As String = "C:\test.csv"


Dim TextLine As String = ""

Dim SplitLine() As String

If System.IO.File.Exists(fName) = True Then

Dim objReader As New System.IO.StreamReader(fName)


Do While objReader.Peek() <> -1

TextLine = objReader.ReadLine()

SplitLine = Split(TextLine, ",")



Me.DataGridView1.Rows.Add(SplitLine)

Loop

Else

MsgBox("File Does Not Exist")

End If

End Sub



Thanks a lot in advance

Bigsin

 
Bigsin,

Change this line of your code:
TextLine = objReader.ReadLine()

to this:
TextLine = objReader.ReadLine.Replace(Chr(34), "")

This will remove all the " chars before populating TextLine so parsing with the , will work nicely.

Regards,
Terry
 
Thanks, but this will not work.
If I,m replacing " for nothing, this will not solve my problem.

The number off commas is different per line then. The reason for using quotes is the comma in the name field for example.

The result will be.

The name John is now in the wrong column Department


Name Department Phone number
Smith John Sales 555-123456,
Brown Sales UK +4412345678,
Johnson sales 555-432156

That's not what I want


 
You could also use a much more flexible Replace() function (found in System.Text.RegularExpression, include it as an Imports or use the fully qualified path as I have in the example):
Code:
Do While objReader.Peek() <> -1

                TextLine = objReader.ReadLine()

                [red]TextLine = System.Text.RegularExpressions.Regex.Replace(TextLine, "("")(.*?)(,)(.*?)("")", "$2$4", RegexOptions.IgnoreCase)[/red]

                SplitLine = Split(TextLine, ",")

                Me.DataGridView1.Rows.Add(SplitLine)

            Loop
That parses all the fields in the example correctly into 3 fields.

Hope this helps

HarleyQuinn
---------------------------------
Black coat, white shoes, black hat, cadillac. The boy's a timebomb!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before post
 
How about using the TextFieldParser?
Code:
        Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\testfile.txt")
            MyReader.TextFieldType = FileIO.FieldType.Delimited
            MyReader.SetDelimiters(",")
            Dim currentRow As String()
            While Not MyReader.EndOfData
                Try
                    currentRow = MyReader.ReadFields()

                    DataGridView1.Rows.Add(currentRow)

                Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                    MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
                End Try
            End While
        End Using
 
You can use Regex to process it. This shows you the pattern and the process to produce an array for the use of your datagridview. It is in c# (importing System.Text.RegularExpressions) and ready to be translated to vb.net word-for-word.
[tt]
//having these "dimensioning" somewhere above
//MatchCollection cm;
//string pattern;
//string[] [blue]SplitLine[/blue];

//main processing pattern for Regex
pattern=@"(?<=(^|,\s*))(""?)([^""]*?("""")*[^""]*?)(\2)(?=(\s*,|$))";

cm=Regex.Matches([blue]TextLine[/blue],pattern);
[blue]SplitLine[/blue]=new string[cm.Count];
//using Group[3] will take out the outer quotes of the data as well
for (int i=0; i<cm.Count; i++) {
SplitLine=cm.Groups[3].ToString();
}

//SplitLine will contain the field data and be ready for processing further
[/tt]
 
tsuji - I may be missing something here (it's quite probable), but isn't that just a more complex way to do what my reply does (just without using Split())?

HarleyQuinn
---------------------------------
Black coat, white shoes, black hat, cadillac. The boy's a timebomb!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before post
 
Please ignore that, I see what you're doing in a slightly different way to the way mine manipulates the textline [blush]

HarleyQuinn
---------------------------------
Black coat, white shoes, black hat, cadillac. The boy's a timebomb!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before post
 
I think that replace is not very desirable. Also it is defective in the sense that this jumps to my mind, for instance, data of a field "Sales, Marketing, UK" with multiple comma inside a field. Just some opinion. Mine is not a criticism against yours - not when I consider the problem.
 
Cheers tsuji.

It's a typical scenario, I had a proper look at yours after I posted my first response and saw how the approach you were using differed in terms of multiple commas (and returning commas in the output string).

HarleyQuinn
---------------------------------
Black coat, white shoes, black hat, cadillac. The boy's a timebomb!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before post
 
The other question is do you have any control on how the txt file is setup? Fixing the way the txt file loads data would be the easiest way to remedy the situation. Also anything that starts with Microsoft.VisualBasic is not .Net so would consider other options.
 
I've test DavidInIowa's textfieldparser and works great!

@ VB4Life you typed "Also anything that starts with Microsoft.VisualBasic is not .Net so would consider other options"

Can you be more specific? is it a problem using this?

Bigsin
 
Can be, like with VbCrLf does not always work(linux), but Environment.NewLine will. The same thing goes for file locations if you store stuff in the localAppdata file(which is common) the path may fail, but Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData) will work for all versions. The best way to not use these is to remove the addReference of the VisualBasic namespace. You will now have to change things like CInt to Convert.ToInt32 or Integer.Parse. The thing is you will be .Net so where you require an app to have the .Net framework installed specific to your build your clients apps will work correctly. We moved to .Net for a reason, right? Small soapbox, but I am not picking on you, just an opinion.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top