Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

using RegExp to catch an issue 2

Status
Not open for further replies.

developer155

Programmer
Jan 21, 2004
512
0
0
US
Is someone good with regexp?
I have a Comma-Delimetered file and text is enclosed with ""
So its like this
"aaa","bbbb","cccccc","","ggg"
"aaa","bbbb","cccccc","","ggg"
"aaa","bbbb","cccccc","","ggg"

I use SSIS to parse the file and sometimes the parser crashes when there is a quotes inside a text qualifier (which is double quote)
So this line would fail:
"aaa","bbbb","cccc"cc","","ggg"
Or this one also:
"aaa","bbb""b","cccccc","","ggg"

Basically we cant have double quotes inside double quote-delimetered part. I can write a custom script to check for this and I am thinking of using RegExp. Anyone knows how to check for double quote inside of "..." with regular expressions?

thanks!!!
 
Well, this doesn't use regular expressions, but it will remove any extraneous double quotes in your string:

Dim s As String

'the input string
s = """aaa"",""bbb""""b"",""cccccc"","""",""ggg"""

MsgBox(s)

'remove all double quotes
s = s.Replace("""", "")

MsgBox(s)

'replace pattern
Dim p As String = """" & "," & """"

'rebuild string with only appropriate double quotes
s = """" & s.Replace(",", p) & """"

MsgBox(s)

I used to rock and roll every night and party every day. Then it was every other day. Now I'm lucky if I can find 30 minutes a week in which to get funky. - Homer Simpson

Arrrr, mateys! Ye needs ta be preparin' yerselves fer Talk Like a Pirate Day! Ye has a choice: talk like a pira
 
will it remove
s = """aaa"",""bbb""""b"",""cc"""cccc"","""",""ggg"""

which has cc"""cccc

basically a quote in the middle?

thanks!
 
Yes. What this code does is remove ALL of the double quotes, then adds double quotes back to the string but only where they are desired.



I used to rock and roll every night and party every day. Then it was every other day. Now I'm lucky if I can find 30 minutes a week in which to get funky. - Homer Simpson

Arrrr, mateys! Ye needs ta be preparin' yerselves fer Talk Like a Pirate Day! Ye has a choice: talk like a pira
 
' Input.txt contains
'
' "aaa","bbbb","cccccc","","ggg"
' "aaa","bbbb","cccccc","","ggg"
' "aaa","bbbb","cccccc","","ggg"
' "aaa","bbbb","cccc"cc","","ggg"
' "aaa","bbb""b","cccccc","","ggg"
' aaa","bbbb","cccccc","","ggg"
' "aaa,"bbbb","cccccc","","ggg"
' aaa, "bbbb", "cccccc", "", "ggg"

Imports System.Text.RegularExpressions

Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim line As String = ""

FileOpen(1, "Input.txt", OpenMode.Input)
Do While Not EOF(1)
line = LineInput(1)
Debug.Print(line & " " & Valid_Line(line))
Loop
FileClose(1)
End Sub

Private Function Valid_Line(ByVal line As String) As Boolean
Dim i As Integer, valid As Boolean = True
Dim fields() As String = Split(line, ",")

For i = 0 To fields.Length - 1
' Pattern : ^"[^"]*"$
' Logic : Begins with a " followed by anything other than a " and ends with a "
If Not Regex.IsMatch(fields(i), "^" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & "$") Then
valid = False
Exit For
End If
Next i

Return valid
End Function
End Class
 
What if a comma was in one of the fields as well as (or instead of) a spurious double quote symbol?

Code:
		Dim source As String = """aaa"",""bbb""""b"",""cc,""cccc"","""",""ggg"""
		MessageBox.Show(source)

		Dim tmp() As String

		Dim splitpattern() As String = {""","""}
		tmp = source.Split(splitpattern, StringSplitOptions.None)
		For a As Integer = 0 To tmp.Length - 1
			tmp(a) = tmp(a).Replace("""", "")
		Next

		Dim target As String = """" + String.Join(""",""", tmp) + """"
		MessageBox.Show(target)

is a safer option.


Hope this helps.

[vampire][bat]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top