Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Split CSV text in one column to multiple rows with count 1

Status
Not open for further replies.

1DMF

Programmer
Jan 18, 2005
8,795
GB
Hi,

I'm having trouble understanding how I split a CSV text column so I have multiple rows, I found many threads with 'Text To Column', so I was looking at trying that first.

An example I found shows
Code:
split("a,b,c,d,e",",")(0)

So i tried to apply that to the query as an expression with
Code:
A : split([Missing],",")(0)
But Access gives the following error when trying to exit the expression
The expression you entered has an invalid .(dot) or !operator or invalid parentheses

I don't think the above will help as I cannot guarantee the the first item in the CSV is always the same, so perhaps I should forget about spliting text to columns and just aim for what i actually am trying to achieve..

What I want to do is take the following data structure.

ID | Missing
1 | a,b,c
1 | b,c
2 | b,c,d,e
3 | c,d,e
3 | d,e


and end up with a count....

a | 1
b | 3
c | 4
d | 3
e | 3

How would I do this with a query in access?

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
Look at this faq701-6293. It has a method for splitting multi-valued fields into separate records. It should produce
[pre]
ID Missing
1 a
1 b
1 c
1 b
1 c
2 b
2 c
2 d
[blue]etc.[/blue]
[/pre]
Then the SQL is simply
Code:
Select Missing, Count(*) As myCount
From myTable
GROUP BY Missing
ORDER BY Missing
 
Wow that looks rather conveluted and crazy creating those extra tables. I also don't understand why we are inserting integers 0 - 9 or creating 20,000 records?

I might as well use VBA to loop the record set and simpy do a count using instr and then spit results to a table for reporting.

Unless you think that would be even more resourse hungry?

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
I guess we all do what we know how to do.

In tests in another context I did a cross-join (which this is) that produced 44,187 cross-join records and then reduced that number to 13,798 records and it did it in 1.063 seconds. The VBA code, looping through a recordset to produce those same 13,798 records, ran in 8.234 seconds. SQL, even with it's apparent overhead, is usually much quicker than VBA.

The Integers table just has the digits 0-9 and a cross join of two of them in qryInt2 produces a range 0-99. You need enough digits to cover the length of your longest string + 2. We are using those numeric values to index our way through the string of values in the multi-valued field (e.g. b,c,d,e in your case).

What we need is every possible combination of starting points (qryInt2 AS S1) and ending points (qryInt2 AS S2) in the string. We then eliminate all the combinations that we don't want

- The end is before the start
- character positions beyond the actual length of the string.
- character positions before the start of the string
- strings defined by start and end that contain a comma.

What we are left with is just the sub-strings between commas and we extract those in the SELECT clause.

I hope that clears up your confusion somewhat. If not, ask more questions.
 
Thanks for the reply, however, I still don't understand at all what this does
Code:
SELECT EmployeeID, 
       Mid("," & E.Children & ",", 
            S1.N + 1 , 
            S2.N - S1.N - 1) AS [Child]

FROM tblEmployees AS E, qryInt2 AS S1, qryInt2 AS S2

WHERE Mid$("," & E.Children & "," ,S1.N , 1 ) = ','
  and Mid$("," & E.Children & "," ,S2.N , 1 ) = ','
  and S1.N > 0 AND S2.N > 0
  and S2.N <= Len(E.Children) + 2
  and S1.N + 1 < S2.N
  and InStr(1,Mid("," & E.Children & ",", 
                        S1.N + 1 , 
                        S2.N - S1.N - 1),",")=0

ORDER BY 1, 2;
It doesn't make any sense to me, nor the purpose of the number table / query with 1,000 of records in them? Plus if I can't understand it, I can't use it!

I start a 10 month SQL course in January for my diploma, perhaps then it might make more sense, until then, I've created a stored procedure to get the main records and will loop them to get the counts I want, at least I understand all the code involved should I need to make an alteration at a later date.

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
Do you have a table with all possible values for Missing ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Good luck in your SQL class. Show this example to your instructor. Perhaps (s)he can explain it better than I can.
 
PHV -> No, I have a CSV string of all needed docs based on case type.

Golom -> Thanks, looking forward to it, one more year and i'll finally have letters after my name - woohoo! (well if I pass that is!)

The entire complication in all this I haven't actually shown so it's not as easy as my request implies.

Typically when I wrote the docs queue system as per the specification they wanted a string that contained all required docs that were existing when the case was checked.

Now they want a report of the most common docs that are missing!

So in my vba I have a global function that you pass in a CSV string of existing docs and it returns a CSV string of those that are missing.

I have no idea how I would port that to T-SQL, and as that's the quickest way of gettig the initial records between a date range, I think the SP is the best way (or view) because as we all know using Access / JET and Joins, it gets all records from all tables then performs the join locally so the query was taking a while anyway, especially as the server is remote located over a crappy VPN tunnel! Though at least I could use the function for the missing docs as an expression within the query.

Because I've outed the query to an SP, I no longer have the string of missing docs because the function isn't available, just the column with existing docs!

(though can i use standard VB in a UDF in SQL ?)

Currently I need to loop the returned recordset to calculate the missing docs, so while I'm there I might as well do the count for the report!

this is my current vba function & helper class, could this be ported to a SQL UDF?

Code:
Public Function DocsMissing(ByVal sCategory As String, ByVal sStatus As String, ByVal sRecType As String, ByVal sDocs As String) As String

    Dim oDocsMissing As clsDocsMissing
    Set oDocsMissing = New clsDocsMissing
    
    oDocsMissing.Category = sCategory
    oDocsMissing.Status = sStatus
    oDocsMissing.RecType = sRecType
    oDocsMissing.Docs = sDocs
    
    DocsMissing = oDocsMissing.Missing
    
    Set oDocsMissing = Nothing
    
End Function

Class ->
Code:
Option Explicit

Private sCategory As String
Private sStatus As String
Private sRecType As String
Private sDocs As String

Public Property Let Category(aValue As String)
    sCategory = aValue
End Property
Public Property Let Status(aValue As String)
    sStatus = aValue
End Property
Public Property Let RecType(aValue As String)
    sRecType = aValue
End Property
Public Property Let Docs(aValue As String)
    sDocs = aValue
End Property
Public Function Missing() As String
       
    If IsNull(sCategory) Or IsNull(sStatus) Or IsNull(sRecType) Or IsNull(sDocs) Then
        MsgBox "You must set Category,Status,RecType & Docs before running this method"
        Missing = ""
        Exit Function
    End If
    
    Dim oDoc As Variant
    
    For Each oDoc In getDocs()
        If Nz(InStr(1, sDocs, oDoc, vbTextCompare), 0) = 0 Then
            If Missing <> "" Then
                Missing = Missing & ","
            End If
            Missing = Missing & oDoc
        End If
    Next
    
End Function

Private Function getDocs() As Collection

    ' sets required docs based on case type - helper method

    Set getDocs = New Collection

        If (sRecType = "MORT") Then
        
            getDocs.Add ("FactFind")
            getDocs.Add ("Research")
            getDocs.Add ("KFI")
            getDocs.Add ("Suitability")
            getDocs.Add ("AppForm")
            getDocs.Add ("ID")
            getDocs.Add ("PoA")
            getDocs.Add ("PoI")
        
            If (sCategory = "Non-Reg Mortgage") Then
                getDocs.Add ("TOB")
            Else
                getDocs.Add ("IDD")
            End If
            
            If (sStatus = "COMP") Then
                getDocs.Add ("Offer")
            End If
            
        Else
        
            getDocs.Add ("IDD")
            getDocs.Add ("FactFind")
            getDocs.Add ("Research")
            getDocs.Add ("Quote")
            getDocs.Add ("KFD")
            getDocs.Add ("D&N")
            getDocs.Add ("AppForm")
            
            If (sCategory = "Protection" And sStatus = "COMP") Then
                getDocs.Add ("LifeAcc")
            End If
            
      End If

End Function

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
So, create a table named, say, tblDocs with a single column named Doc.
Populate it with the doc names (AppForm,D&N,FactFind,ID,IDD,KFD,KFI,LifeAcc,Offer,PoA,PoI,Quote,Research,Suitability,TOB)
And then:
SQL:
SELECT D.Doc, COUNT(*) AS NumOfMiss
FROM tblMissing M, tblDocs D
WHERE ',' & M.Missing & ',' LIKE '*,' & D.Doc & ',*'
GROUP BY D.Doc

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
This is the final report function that puts it all together...
Code:
Public Sub Common_Missing_Docs()

    ' Declare vars
    Dim rs As DAO.Recordset
    Dim qd As DAO.QueryDef
    Dim vDoc As Variant
    Dim sStart As String
    Dim oDocs As New clsDocsMissing
    Dim dDocs As Object
    
    sStart = Format(Now(), "HH:MM:SS")
    
    'set up counter dictionary coz VB collections suck!
    Set dDocs = CreateObject("Scripting.Dictionary")
    For Each vDoc In oDocs.AllDocs
        dDocs.Add vDoc, 0
    Next
   
    ' set up query
    Set qd = CurrentDb.QueryDefs("spPassThroughWeb")
    qd.SQL = "spRPT_Documents '" & Nz([Forms]![switchboard].Date1, "2004/01/01") & "','" & Nz([Forms]![switchboard].Date1, Format(Now(), "yyyy/mm/dd")) & "','" & Nz([Forms]![rptQueryTool].Officer, "%") & "'"
    
    'get recordset
    Set rs = qd.OpenRecordset
       
    ' clear missing docs reporting table
    CurrentDb.Execute ("UPDATE rptDocuments SET Cnt = 0 WHERE 1=1")
    
    ' loop records and get missing docs plus update counters
    Do While Not rs.EOF
        For Each vDoc In Split(DocsMissing(rs.Fields("Category"), rs.Fields("Status"), rs.Fields("Rec_Type"), rs.Fields("Docs_Present")), ",", , vbTextCompare)
            dDocs.Item(vDoc) = dDocs.Item(vDoc) + 1
        Next
        rs.MoveNext
    Loop
    
    ' update reporting table
    For Each vDoc In oDocs.AllDocs
        CurrentDb.Execute ("UPDATE rptDocuments SET Cnt = " & dDocs.Item(vDoc) & " WHERE Doc = '" & vDoc & "'")
    Next
    
    ' clear variables
    Set rs = Nothing
    Set dDocs = Nothing
    Set oDocs = Nothing
    Set vDoc = Nothing
    
    MsgBox "Start: " & sStart & " , Finish : " & Format(Now(), "HH:MM:SS")
    
    ' run report
    DoCmd.OpenReport "Common_Missing_Docs", acViewPreview
        

End Sub

Additonal helper method via a public property (AllDocs)
Code:
Private Function AllDocuments() As Collection

    ' sets colletion with all possible docs

    Set AllDocuments = New Collection
      
    AllDocuments.Add "FactFind"
    AllDocuments.Add "Research"
    AllDocuments.Add "KFI"
    AllDocuments.Add "Suitability"
    AllDocuments.Add "AppForm"
    AllDocuments.Add "ID"
    AllDocuments.Add "PoA"
    AllDocuments.Add "PoI"
    AllDocuments.Add "TOB"
    AllDocuments.Add "IDD"
    AllDocuments.Add "Offer"
    AllDocuments.Add "Quote"
    AllDocuments.Add "KFD"
    AllDocuments.Add "D&N"
    AllDocuments.Add "LifeAcc"

End Function
This months processing (28 records) took 0 seconds to run - sweet!

The entire database (1,023 records) since the app went live took 0 seconds to run also - damn awesome!

OK I created an additional helper method in the clsMissingDocs class to provide a collection of all possible docs via a public property, and there is a little messing about with a dictionary, but it's so fast it isn't even registering with the Start->Finish counters.

Do I need to change anything?

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
Sorry PHV just noticed your post.

What's 'tblMissing' ? Do you mean I keep my old query where I use the function to calculate the missing docs?

The probglem with that is the slowness of the table joins :-(

I could have a look at running the SP with the funciton bound to it in query and then do the additional query you show above.

Dunno, with current speed of my VBA version, is it worth refactoring?

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
tblMissing is the table you showed on your 1rst post (ID, Missing).

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
yeah , thought so, unfortunately that data doesn't exist. I was using the UDF in the access query to create it!

I actually only have (ID, Existing) - [banghead]

Once I moved to the stored procedure, the returned recordset doesn't have (ID, Missing), unless I can create a SQL UDF to calculate it?

This is why the report request is such a pain in the butt, they asked for a system that only stores 'existing' then request a report for 'missing'.

Because the data is spread across more than one table the joins grind Access to a halt, don't need to tell you that , I believe it was you and George who turned me onto SP's & Views after explaining how JET does table joins and why my apps were so slow!

They are unlikely moving forward to want to run the report for all data ever collected and probably do a month 'snap shot' , which isn't going to be more than a few 1,000 records, which looks like with my current method will take litterally seconds.

Perhaps this is a case of 'if it aint broke' ?

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
1DMF said:
Perhaps this is a case of 'if it aint broke' ?

I quite agree. The critical difference between your situation and the situation being addressed in the FAQ I referenced is that you know all the possible values that the "missing" docs can assume. That means that the SQL approach that I proposed is overkill on a massive scale.

As to your code, there are a couple of nit-picky issues ... which translates to "I wouldn't have done it that way" ... but generally it's well structured and should be easily readable. About the only thing that makes me uneasy is the Function GetDocs. I just have problems with data values embedded in code. Maybe these things are like days of the week or month names ... they will never change. But if they aren't you need to remember where you stashed them and modify the code accordingly.
 
Hi Golom,

Thanks for your input, as you say your solution seemed to me so OTT, that was part of why I couldn't wrap my head around it, it made no sense what it seemed to be doing for my situation, plus my T-SQL skills need improving!

I agree with you regarding hardcoded data types and perhaps if I was going to do that maybe I should have enumerated them?

Though you are only seeing half the picture as far as the usage and application goes, there are a bunch of forms with checkboxes on them, depending on the 'Rec_Type', 'Status' & 'Category' from the actual case record, and the app loads the appropriate form, for the case checker to then tick which documents were present on file at the time of check.

Now maybe I could refactor so there is a simple table with doc names, description (for the email that goes to the broker to request they complete the missing docs), and a counter (for the reporting), that could be used and then have one form that loads controls based on the docs table and then there would be no hard coding anywhere regarding the docs.

When you are given a specification on the back of a napkin and told they want a working version in 3 days, you have to compromise and take the shortest route, which sometimes involves hard coding.

Now I have a better grasp of the overall application and what they want from it, perhaps I might refactor, but as these regulatory required docs that haven't changed in 8 years and aren't likely to change in the forseable future, why add such dynamic ability and additional load on the SQL server for constantly looking up doc names and descriptions on every case accessed by the checkers (which are a lot), it seems over kill and an additional unwanted strain on our already overloaded VPN tunnel to constantly query SQL for something that can be hardcoded and left for the client machine to process not our SQL server nor our VPN tunnel thsat hooks up to SQL. If the list was likely to change regularly then a dynamic solution would be the only way to go, as it is unlikely to change year in year out, hardcoding seems the most efficient option. Well to me, would you agree?

Sometimes the right way isn't always the best way, when you weigh everything up.

Hey I know there is an argument that you should normalise your DB and never store multiple values in a single column anyway, and even that FAQ suggests using that method to refactor your DB and normalise it.

Though I see nothing wrong with CSV strings, especially as the most common usage in the app is to simply display the column on the history screen so you can see an audit trail easily of what docs were present on each check, as per the app specification.

The screen to display the data would use more resources (especially as it's via JET) if I had to link to another table to look up the 1-many records to get the existing docs (and there are several entries per case, as each time they chase the broker another entry is made in the history table), you could end up with 1 record linking to 100 records if the CSV string was normalised to another table, then having to join 3 tables together geting hundreds of records over JET, then having to concatenate them so the history form displays them as a CSV string on screen.

That to me seems well inefficient and so having a single CSV column in the history table, to me seems the better approach.

Perhaps I will be persuaded otherwise when I start my SQL course, but again due to the difference between T-SQL and JET, normalisation (especially 3rd normal form) can grind your Access app to a halt if done through standard Access queries & JET!

Perhaps I should try to refactor all queries to a T-SQL stored procedure or create more views?

But redesigning an entire DB and all applications that have been built around it over 10+ years, is a huge task, fraught with likely downtime and data corruption if not implemented correctly and as the IT department is just me, man power is also a premium!

Anyway enough of me wittering on, I got a report to finish ;-)

Though I must admit, the speed of my solution is so awesome, I'm not going to let a little hardcoding and a non-normalised table structure stop me from being chuffed!

As to your code, there are a couple of nit-picky issues ... which translates to "I wouldn't have done it that way"
As an aside, I am interested regarding what you would do differently and why, I'm always trying to improve what I do, (hence sending myself back to university), so I would appreciate your input.




"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
I guess my first nit-picky thing in your class module is
Code:
If IsNull(sCategory) Or IsNull(sStatus) Or _
   IsNull(sRecType) Or IsNull(sDocs) Then

All those variables are defined as strings. Although, in a RDBMS, most fields can contain NULL regardless of their data type, in VB only a Variant can be NULL. That means that IsNull will always return FALSE. You may want something like

Code:
If Trim$(sCategory) = "" Or Trim$(sStatus) = "" Or _
   Trim$(sRecType) = "" Or Trim$(sDocs) = "" Then

Besides, if you always use Function DocsMissing to invoke the class then you are guaranteed to have all the attributes set so validating them is probably pointless. Whether or not they are set to correct values is a different issue.

Second, I'm uneasy about class methods having direct communications with the user unless they are designed for that purpose. Specifically a MsgBox call inside the class module implies that, when this is called, it will be at a point in the code where the user can do something about it. That message box is really a message to you, the programmer, that you haven't set values for the attributes before invoking the method. A user doesn't need to see it and likely can't do much about it if he does.

We've already had the discussion about hard-coding and I fully accept that in the less than perfect world of slow connections and relatively static values, this is not at or even near the top of your To-Do list. As you said ... "If it ain't broke ..."

I like your code in "Common_Missing_Docs". Obvious and elegant*. No wasted motions.

* - A colleague long ago, upon my commenting that some code was "elegant", corrected me.
"Women are elegant. Code is merely efficient.
 
Dang nice spot,

Flicking between VBA / Perl / JavaScript / SQL plus what ever 'course code' language my OU course is currently using sometimes has me using the wrong testing operators!

Remembering if it's undefined, null or blank ("") sends my head in a spin sometimes!

I also accept you are probably correct, that the message is for me not the user, no user would be instatiating any objects, and the contract for correct funtionality should really be described in Pre / Post conditions rather than popup messages!

Thanks for the kind words on the reporting function, first time anyone has ever paid me such a compliment regarding my code, I strive to write the best and most efficient code I can, and usually fall flat on my face and am way of the mark, so it's nice being told when someone thinks you got it right for a change!

I have removed the validation checking from the clsDocsMissing class for the 'Missing' public method, and although I don't always use the DocsMissing function to call the class method (which is why I refactored to allow optional arguments), I will simply have to deal with errors while testing if I forget to ensure these values are set prior to calling the 'Missing' method. The DocsMissing global function is really so I can use it as a UDF within a query, which I use for another report.

I guess I should add some Pre/Post condition comments to make it 'OU' course acceptable code, which I can be guilty of ommiting on occassion! (ok, quite alot!).

On that note, is there a type of comment in VBA like Perl or JavaDocs that gives you a simple way of viewing the class description, attributes, message protocol and each method's contract pre/post conditions?

And if so how do you then view these details regarding a class?

"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
What? You mean you don't religiously comment your code as you're creating it? Scandalous!

If you have MZ-Tools there is an option to generate XML Documentation for your system. Unfortunately it does the whole system, not just a specific part of it. The output from that is, to put it mildly, verbose. Generating it is simple ... using it effectively in your comments probably isn't. I don't speak Perl or JavaDocs so I have no basis for comparison.
 
What? You mean you don't religiously comment your code as you're creating it? Scandalous!
Do I detect a hint of sarcasm [tongue]

The output from that is, to put it mildly, verbose. Generating it is simple ... using it effectively in your comments probably isn't.
Hmm, perhaps it is simpler to just comment my code better and then read the pre/post conditions when I go to use the methods again.

Thanks for all your input, much appreciated.



"In complete darkness we are all the same, it is only our knowledge and wisdom that separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!"

Free Dance Music Downloads
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top