Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Excel Import issue using Split() 1

Status
Not open for further replies.

BJZeak

Programmer
May 3, 2008
230
CA
stumped with excel data import ... excel creates a CSV file that for the most part works up until it hits numbers larger then 999 when using the split() function ... exporting as Tab txt has the same issue ... excel wraps cell data in quotes when the cell contains a comma.

I was expecting split to recognize the quotes as meaning start of text but it appears to not understand them

Public Function gfnEntry(inPosition, chStr, Optional chDelim) As String
Dim chAry() As String
If IsMissing(chDelim) Then chDelim = ","
chAry = Split(chString, chDelim)
gfnEntry = chAry(inPosition - 1)
End Function

abc,100,0,xyz creates an array 0-3
abc,101,0,xyz ditto
.
.
.
abc,"1,000",0,xyz creates an array 0-4

x[0] = abc
x[1] = "1
x[2] = 000"
x[3] = 0
x[4] = xyz



 
Try something like
Code:
Public Function gfnEntry(inPosition, ByVal chStr, Optional chDelim) As String
Dim chAry() As String
Dim m As Integer
Dim n As Integer

If IsMissing(chDelim) Then chDelim = ","

Do
   n = InStr(1, chStr, """")
   If n > 0 Then
      m = InStr(n + 1, chStr, """")
      chStr = Left(chStr, n - 1) & _
              Replace(Mid(chStr, n + 1, m - n - 1), chDelim, "") & _
              Mid(chStr, m + 1)
    End If
Loop Until n = 0

chAry = Split(chStr, chDelim)
gfnEntry = chAry(inPosition - 1)
End Function
 
THX

Just a slight mod ... to keep it from running forever :)

m = 1
do
n = InStr(m,chStr,"""")


Sometimes the simplest solution is the best ... I was thinking in terms of situations where the delimiter may be required in the quoted string ... in order to satisfy that the line delimiter would need to be temporarily overloaded which is not a trivial piece of logic to cover all possible combinations of Quote.


 
Not necessary. It won't run forever as it is ... and that modification could screw up the processing by skipping characters.

The code finds pairs of quotes and removes them on each pass. It will stop running when there are no more quotes in the string.

If you want to preserve the delimiters then
Code:
Public Function gfnEntry(inPosition, ByVal chStr, Optional chDelim) As String
Dim chAry() As String
Dim m As Integer
Dim n As Integer

If IsMissing(chDelim) Then chDelim = ","

Do
   n = InStr(1, chStr, """")
   If n > 0 Then
      m = InStr(n + 1, chStr, """")
      [COLOR=black cyan]' Get out if there is no matching end quote.[/color]
      If m = 0 Then Exit Do
      chStr = Left(chStr, n - 1) & _
              Replace(Mid(chStr, n + 1, m - n - 1), chDelim, [red]Chr(6)[/red]) & _
              Mid(chStr, m + 1)
    End If
Loop Until n = 0

chAry = Split(chStr, chDelim)
gfnEntry = [red]Replace([/red]chAry(inPosition - 1)[red],Chr(6),chDelim)[/red]
End Function
 
Ok Sorry ... after rereading your code again I see you were deleting both the embedded delimiter as well as the quotes which would work fine (by adjusting your pointers) ... I had read it as just removing the delimiter in which case "n" would never go to zero. Using "m" would not miss anything as the instr starting position is either 1 or the first character after the last quote.

My approach when making functions like this is to attempt to build them as generic as possible. Some possible solutions:
a) remove embedded delimiters from quoted elements
b) remove embedded delimiters and matching quotes
c) overload the line delimiter
d) overload the line delimiter and remove matching quotes

Option a and b are simple but could affect the data especially if the delimiter is required in the data

Option c is one step closer but Option "d", I believe ultimatly is the one that truly satisfies a totally generic function

Some considerations:
- there could be more then one quoted element
- there could be quotes which are single ended (not constituting a quoted element)
- quotes could be single "'" or double '"'
- element quotes don't don't have to be consistant on a line (one element could be using single quotes and another double)
- delimiters and overloaded delimiters could be contained in quoted elements
- delimiters could be followed by spaces

Using:
chStr = <element.1><delim><elememt.2><delim> ... <delim><element.N>
chQuotes = chr(34) & chr(39)
chDelim = ","

The following rules might be applied to determine a valid quoted element:

i) The first character in Elememt.1 is a quote followed by
a matching quote as:
- the last character in Element.N
OR
- that is followed by a delimiter

ii) The first character following a Delimiter is a quote
followed by a matching quote as:
- the last character in Element.N
OR
- that is followed by a delimiter

To simplify the above we can add a delim in front and back of the string before processing:

<delim><element.1><delim><element.2><delim> ... <delim><element.N><delim>

Then the quoted elements simplify as beginning with <delim><quote> and ending with <quote><delim>

The code would need to identify all the quoted elements, choose a valid overloaded delimiter test to ensure the overloaded delimiter is not also contained in the quoted elements, and remove the matching quotes, everything inside the matching quotes would then remain untouched





 
Would Regular Expressions have a place here?

thread707-988937
 
I looked at this a bit more and came up with
Code:
Function HandleQuotes(ByVal StartString As String, _
                      Optional Quote As String = """", _
                      Optional Delim As String = ",", _
                      Optional PreserveDelim As Boolean = True) As String()

    Dim a()                         As String
    Dim TempChar                    As String
    Dim NewString                   As String
    Dim i                           As Integer
    Dim m                           As Integer
    Dim n                           As Integer

    TempChar = IIf(PreserveDelim, Chr(6), "")
    NewString = StartString

    Do
        n = InStr(1, NewString, Quote)
        If n > 0 Then
            m = InStr(n + 1, NewString, Quote)
            If m = 0 Then Exit Do
            NewString = Left(NewString, n - 1) & _
                        Replace(Mid(NewString, n + 1, m - n - 1), Delim, TempChar) & _
                        Mid(NewString, m + 1)
        End If
    Loop Until n = 0

    a = Split(NewString, Delim)
    
    If PreserveDelim Then
        NewString = Replace(NewString, TempChar, Delim)
        For n = LBound(a) To UBound(a)
            a(n) = Replace(a(n), TempChar, Delim)
        Next
    End If

    HandleQuotes = a

End Function

If you set "PreserveDelim" to TRUE then Embedded delimiters are retained in the returned string. When it's FALSE then they are eliminated.
 
Very eloquent bait and switch … sometimes its hard to see the forest for the trees … Thank-you for the insight.

I took a look at the RegExp syntax the other day and wasn't able to muscle my brain cells to comprehend the logic almost another language in itself. I truly enjoy KISS solutions.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top