Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Find Repeating Patterns of Numbers 1

Status
Not open for further replies.

GKIL67

Technical User
Dec 1, 2009
44
Hello,
I have the output of a crosstab query that has finite number of columns [BALL] as numbers (i.e. 1, 2, 3... 45) and
each row [CNT] has a unique ascenting number as a name (i.e. 137, 138, 139...).
The value [SKIPS] can be any number, including 0. Each column could start or end by a NULL.

I want to find for each column [BALL] if there are ANY series of combinations of numbers [SKIPS] that
repeat (I don't know it in advance) and if THERE ARE, then how many times they repeat.
The search starts from the lowest [CNT] to the highest [CNT].
The series of combination of numbers (how many numbers in a combination at a time) has to be controled
by an X parameter, i.e. X=2 means to check for repeating patterns of 2 numbers.

I've found very few functions around that process strings of letters for this purpose and in my case there
are numbers to deal with... moreover, I dont have them in a string!

I understand that that's not an easy thing to solve, morever on a query level.
I would really appreciate any help that could put in the right track.

Below is an example of the output of the crosstab query, with columns 3 and 35:
cnt 3 35

137 28
138 8
139 11
140 6
141 0 11
142 2 3
143 0 3
144 17 14
145 4 8
146 1 5
147 21 13
148 34
149 4
150 4
151 4
152 1
153 8
154 17
155 13
156 34
157 14
158 6
159 11
160
161

Below is the expected outcome, assuming that it writes in a table:
i.e. tblPatRep for X=2 and columns 3 and 35:
clmn Location Pattern X

3 0 0 2
35 147 13, 34 2
35 155 13, 34 2

Thank you in advance!
 
1 2 3 4 1 2 3 4 1 2 3
What about patterns and sub patterns?
is that two patterns
1234,1234
or five
123,123,123,1234,1234
or more
12,12,12
23,23,23
123,123,123
234,234
34,34
 
That is actually the role of parameter X.
Based on your sample,
For X=2, acceptable are 12,12,12 and 23,23,23 and so on...
For X=4, acceptable are 1234, 1234.

Thank you MajP, do you think you have a way for me to handle it?

 
I can do this, and it would be pretty easy for me, but I am very proficient in working with recordsets, building custom classes, building custom class collections, and executing sql from code. I would use all of that to leverage the power of sql. The code would be reuseable and scalable. Although I know how to do it, I still think it would take me about atleast an hour to write it and get it to work. Not sure how proficient you are with these areas.

I do not think there is any silver bullet pure Sql. If someone comes up with it, I will definately be impressed. You could probably write it in sql, but I doubt you could make it very reuseable and flexible. In code I can write this to any original query with any amount of columns and any names for the columns.

So here is the basic logic.
Not sure what I would do in queries, new tables, or pure code. Can be done with each, but I would have to try to see what is the easiest to resuse

1) Assume you have a main routine that loops through each of the data columns numbers and put into a new table/query.
The new table would start with Cnt and the Column 3

2) I add a new column 3_1 and fill it by offsetting one record
At this point column 3_2 does not exist (ignore it)

4)

Cnt 3 3_1
141 11 3
142 3 3
143 3 14
144 14 8
145 8 5
146 5 13
147 13 34
148 34 4
149 4 4
150 4 4
151 4 1
152 1 8
153 8 17
154 17 13
155 13 34
156 34 14
157 14 6
158 6 11
159 11 3
160 3 3
161 3

5) then I add a conatenated field to the table/query. I think I am writing to the table and not building a calculated field. This would make it more reuseable.

Cnt 3 3_1 Concat
141 11 3 11,3
142 3 3 3,3
143 3 14 3,14
144 14 8 14,8
145 8 5 8,5
146 5 13 5,13
147 13 34 13,4
148 34 4 34,4
149 4 4 4,4
150 4 4 4,4
151 4 1 4,1
152 1 8 1,8
153 8 17 8,17
154 17 13 17,13
155 13 34 13,34
156 34 14 34,14
157 14 6 14,6
158 6 11 6,11
159 11 3 11,3
160 3 3 3,3
161 3

6) Now do a group by query on concat and return the count(concat) > 1. That would tell you which combinations are repeated.
Then you could find those combinations and add to a table. Here again I am probably going to do this in code. I am going to build two collections or (possibly dictionaries). Read down the concat column and add to the first collection. The key is the concat value. When I try to add a duplicate key it throws an error. I add that key to the second collection. This collection holds the keys of duplicates.

7) Now loop the collection of duplicates
read the key
query the recordset for that key. Rtn Cnt
do something with the results (you know, cnt, concat, number in series)
repeat

8) do until duplicate collection is empty

in other words if you got patterns of 2, you could also have patterns of 3 so then you add another column and repeat

Cnt 3 3_1 3_2
141 11 3 3
142 3 3 14
143 3 14 8
144 14 8 5
145 8 5 13
146 5 13 34
147 13 34 4
148 34 4 4
149 4 4 4
150 4 4 1
151 4 1 8
152 1 8 17
153 8 17 13
154 17 13 34
155 13 34 14
156 34 14 6
157 14 6 11
158 6 11 3
159 11 3 3
160 3 3
161 3

9) do above for each column in the original query.



Now that I look at it, I will probably skip tables and queries and do it mostly in code. A main routine can pass in the original query name, the column number, and the amount in sequence. It could build a collection/dictionary holding the values I show in the concat field by looping a recordset. Then i would push those into another collection, and the third collection holds the duplicates. Read the recordset one more time recording the duplicate information. I would probably insert those results into a new table. The main routine would cylce through each column. Sounds complicated, but if you do it in pieces each class, procedure is relatively simple.
 
I really couldn't follow, programming wise, what you post in the beginning as "So here is the basic logic."

But at the end, you say "I will probably skip tables and queries and do it mostly in code. A main routine can pass in the original query name, the column number, and the amount in sequence. It could build a collection/dictionary holding the values I show in the concat field by looping a recordset..." which makes more sense to me. I found this piece of code ( but it doesn't deal with numbers, neither looks in a query. So, I wouldn't know how to rearrange it to fit my needs...
Code:
Option Base 1
Dim Patterns(255, 2) As String
Dim PatternFound As Boolean

Private Sub Command1_Click()
Dim iStep As Integer
Dim  J As Integer
Dim PatternsPosCount As Long

PatternsPosCount = 0
For iStep = 2 To 10
    For     = 1 To Len(Text1) Step iStep
        'Obtain the new pattern
        strpattern = Mid(Text1,  J, iStep)
        If Len(strpattern) <> iStep Then GoTo NextStep
        
        'search if this pattern is not currently in the array
        PatternFound = False
                        
        For i = 1 To PatternsPosCount
            If strpattern = Patterns(i, 1) Then
                Patterns(i, 2) = Val(Patterns(i, 2)) + 1
                PatternFound = True
                Exit For
            End If
        Next i
                
        If Not PatternFound Then
            PatternsPosCount = PatternsPosCount + 1
            Patterns(PatternsPosCount, 1) = strpattern
            Patterns(PatternsPosCount, 2) = Val(Patterns(PatternsPosCount, 2)) + 1
            
        End If
         
    Next    
NextStep:

Next iStep

For i = 1 To PatternsPosCount
    MsgBox "Pattern: " & Patterns(i, 1) & " have " & Patterns(i, 2) & " repeats"
Next i

End Sub
 
Hello, still can get it... Maybe MajP, could you take me a step further? If you think that it is too hard, maybe to look only at a function that searches a comma delimited string of numbers and returns in a table the repeated patterns, as specified by X?
i.e. strNumbers = "11,3,3,14,8,5,13,34,4,4,4,4,8,17,13,34,4,4,11" we get:
X PATTERN Repeats
2 13,34 2
2 4,4 3
3 13,34,4 2

Of course this is not my original issue but I'm trying to "steal" as little as possible of an expert's time... Thank you!
 
Ok. I am going to get counseled for posting code, but this is a real challenging problem. I just wanted to see if I could do it. This was hard even for me. Also did not see a lot of other people chiming in. I do not think I could ever explain it without writing it.

So here is the function you need to call. It returns a custom "Sequences" object. This is a custom collection. So you can loop the collection and each item has information about a repeated sequence.
It returns every sequence, of every length, for every column. You just pass it a query name. This should work for any query of any size with and type of location fields with any column names. I imagine someonelse could come up with a simpler tailored answer, I doubt anyone else could come up with a general function like this.

Code:
Public Function GetAllSequencesForAllColumns(QueryName As String) As Sequences
'You have to push in a query where the first column is a location
'All subsequent columns are returned
Dim rs As DAO.Recordset
Dim allSequences As New Sequences
Dim tempSequences As Sequences
Dim I As Integer
Dim j As Integer
Dim ts As Sequence
Set rs = CurrentDb.OpenRecordset(QueryName)
For I = 1 To rs.Fields.Count - 1
  Set tempSequences = New Sequences
  Set tempSequences = GetAllRepeatedSequences(GetColumn(QueryName, I))
  For j = 1 To tempSequences.Count
    Set ts = tempSequences.Item(j)
    allSequences.Add ts.ColumnName & " " & ts.Location, ts.Sequence, ts.ColumnName
  Next j
Next I
Set GetAllSequencesForAllColumns = allSequences
End Function

A sequence itm has the following properties.
The ColumnName
The Location in column
The Sequence string

So you can return this collection and then insert the values into a table. This would be helpful because the result is not sorted.
Here is an example of using it
Code:
Public Sub GetAll()
  Dim allSequences As Sequences
  Dim I As Integer
  Set allSequences = GetAllSequencesForAllColumns("tblData")
  For I = 1 To allSequences.Count
    Debug.Print allSequences.Item(I).ToString
  Next I
End Sub

input
Code:
ID	ColOne	ColTwo	ColThree
1			
2			
3	11	3	
4	3	4	
5	3	5	
6	14	3	
7	8	4	
8	5	5	
9	13	6       13
10	34	6       34
11	4      34       4
12	4               13
13	4               14
14	1               4
15	8               3
16	17              4
17	13              5
18	34              3
19	14              4
20	6               5
21	11		
22	17		
23	13		
24	34		
25	17		
26	13		
27	34		
28	14

output
Code:
Location: ColOne 17 (2)  Sequence: 13,34
Location: ColOne 9 (2)  Sequence: 13,34
Location: ColOne 22 (2)  Sequence: 17,13
Location: ColOne 16 (2)  Sequence: 17,13
Location: ColOne 23 (2)  Sequence: 13,34
Location: ColOne 25 (2)  Sequence: 17,13
Location: ColOne 26 (2)  Sequence: 13,34
Location: ColOne 27 (2)  Sequence: 34,14
Location: ColOne 18 (2)  Sequence: 34,14
Location: ColOne 22 (3)  Sequence: 17,13,34
Location: ColOne 16 (3)  Sequence: 17,13,34
Location: ColOne 25 (3)  Sequence: 17,13,34
Location: ColOne 26 (3)  Sequence: 13,34,14
Location: ColOne 17 (3)  Sequence: 13,34,14
Location: ColOne 25 (4)  Sequence: 17,13,34,14
Location: ColOne 16 (4)  Sequence: 17,13,34,14

Location: ColTwo 6 (2)  Sequence: 3,4
Location: ColTwo 3 (2)  Sequence: 3,4
Location: ColTwo 7 (2)  Sequence: 4,5
Location: ColTwo 4 (2)  Sequence: 4,5
Location: ColTwo 6 (3)  Sequence: 3,4,5
Location: ColTwo 3 (3)  Sequence: 3,4,5

Location: ColThree 18 (2)  Sequence: 3,4
Location: ColThree 15 (2)  Sequence: 3,4
Location: ColThree 19 (2)  Sequence: 4,5
Location: ColThree 16 (2)  Sequence: 4,5
Location: ColThree 18 (3)  Sequence: 3,4,5
Location: ColThree 15 (3)  Sequence: 3,4,5

So in order to do all this, you need all of this behind the scenes.



class ColumItem
Code:
Option Compare Database
Option Explicit

Private mLocation As Variant
Private mItemValue As Variant
Private mColumnName As String
Public Property Get Location() As Variant
  Location = mLocation
End Property
Public Property Let Location(ByVal Value As Variant)
 mLocation = Value
End Property
Public Property Get ItemValue() As Variant
  ItemValue = mItemValue
End Property
Public Property Let ItemValue(ByVal Value As Variant)
  mItemValue = Value
End Property
Public Property Get ColumnName() As String
  ColumnName = mColumnName
End Property
Public Property Let ColumnName(ByVal Value As String)
  mColumnName = Value
End Property
Public Sub Initialize(Location As Variant, ItemValue As Variant, Optional ColumnName As String = "")
  Me.Location = Location
  Me.ItemValue = ItemValue
  Me.ColumnName = ColumnName
End Sub
Public Function ToString()
  ToString = "Column Name: " & Me.ColumnName & "   Location: " & Me.Location & "   Value: " & Me.ItemValue
End Function
class ColumnItems
Code:
Option Compare Database
Option Explicit

Private mCIs As New Collection
Private mColumnName As String
' Add a SQ object to the collection.
Public Sub Add(ByVal Location As Variant, ByVal ItemValue As Variant, Optional ColumnName As String = "")
    Dim CI As New ColumnItem
    mColumnName = ColumnName
    CI.Initialize Location, ItemValue, ColumnName
    mCIs.Add CI, CStr(Location)
End Sub
' Return the number of items in the collection.
Public Function Count() As Long
    Count = mCIs.Count
End Function
' Remove an SQ from the collection.
Public Sub Remove(ByVal Index As Variant)
    mCIs.Remove Index
End Sub
' Return a SQ.
Public Function Item(ByVal Index As Variant) As ColumnItem
    Set Item = mCIs(Index)
End Function
Public Property Get ColumnName() As String
  ColumnName = mColumnName
End Property
Public Property Let ColumnName(ByVal Value As String)
  mColumnName = Value
End Property
class Sequence
Code:
Option Compare Database
Option Explicit

Private mLocation As Variant
Private mSequence As String
Private mColumnName As String
Public Property Get Location() As Variant
  Location = mLocation
End Property
Public Property Let Location(ByVal Value As Variant)
 mLocation = Value
End Property
Public Property Get Sequence() As String
  Sequence = mSequence
End Property
Public Property Let Sequence(ByVal Value As String)
  mSequence = Value
End Property
Public Sub Initialize(Location As Variant, Sequence As String, Optional ColumnName As String = "")
  Me.Location = Location
  Me.Sequence = Sequence
  Me.ColumnName = ColumnName
End Sub
Public Property Get ColumnName() As String
  ColumnName = mColumnName
End Property
Public Property Let ColumnName(ByVal Value As String)
  mColumnName = Value
End Property

Public Property Get sequenceLength()
  sequenceLength = UBound(Split(Sequence, ",")) + 1
End Property
Public Function ToString()
  ToString = "Location: " & Me.Location & "  Sequence: " & Me.Sequence
End Function

class Sequences

Code:
Option Compare Database
Option Explicit

Private mSQs As New Collection
Private mColumnName As String
' Add a SQ object to the collection.
Public Sub Add(ByVal Location As Variant, ByVal Sequence As String, Optional ColumnName As String = "")
    Dim SQ As New Sequence
    Me.ColumnName = ColumnName
    SQ.Initialize Location, Sequence, ColumnName
    mSQs.Add SQ, CStr(Location)
End Sub
Public Sub Add2(SQ As Sequence)
    mSQs.Add SQ, SQ.Location
End Sub
' Return the number of items in the collection.
Public Function Count() As Long
    Count = mSQs.Count
End Function
' Remove an SQ from the collection.
Public Sub Remove(ByVal Index As Variant)
    mSQs.Remove Index
End Sub
' Return a SQ.
Public Function Item(ByVal Index As Variant) As Sequence
    Set Item = mSQs(Index)
End Function
Public Property Get ColumnName() As String
  ColumnName = mColumnName
End Property
Public Property Let ColumnName(ByVal Value As String)
  mColumnName = Value
End Property

mdlSequences
Code:
Option Compare Database
Option Explicit
Public Function getSequences(CIs As ColumnItems, sequenceLength As Integer) As Sequences
  Dim AllSQs As New Sequences
  Dim ConcatenatedKey As String
  Dim I As Integer
  Dim j As Integer
  Dim aItms() As String
  
  On Error GoTo errlbl
  
  For I = 1 To CIs.Count
    j = 1
    ConcatenatedKey = CStr(CIs.Item(I).ItemValue)
    Do While I + j < CIs.Count + 1 And j < sequenceLength
      If Not CIs.Item(I).ItemValue = CIs.Item(I + j).ItemValue Then
        ConcatenatedKey = ConcatenatedKey & "," & CStr(CIs.Item(I + j).ItemValue)
      End If
      j = j + 1
    Loop
    aItms = Split(ConcatenatedKey, ",")
    'This accounts for the end of the list where you add less then the sequence length
    If UBound(aItms) = sequenceLength - 1 Then
       AllSQs.Add CIs.Item(I).Location, ConcatenatedKey, CIs.Item(I).ColumnName
    End If
  Next I
  AllSQs.ColumnName = CIs.ColumnName
  Set getSequences = AllSQs
  Exit Function
errlbl:
    MsgBox Err.Number & " " & Err.Description
End Function

Public Function getRepeatedSequences(columnData As ColumnItems, sequenceLength As Integer) As Sequences
  Dim RepeatedDataDict As New Dictionary
  Dim NonRepeatedDataDict As New Dictionary
  Dim RepeatedSequences As New Sequences
  Dim allSequences As New Sequences
  Dim SQ As Sequence
  Dim strKey As String
  Dim strLocation As String
  Dim I As Integer
  Dim dictItm As Variant
  
  On Error GoTo errlbl
  Set allSequences = getSequences(columnData, sequenceLength)
   For I = 1 To allSequences.Count
     Set SQ = allSequences.Item(I)
     strKey = SQ.Sequence
     If Not NonRepeatedDataDict.Exists(strKey) Then
       'This dictionary hold keys representing the Sequences
       NonRepeatedDataDict.Add strKey, SQ.Location
     Else
       'This dictionary holds keys representing the location
       RepeatedDataDict.Add CStr(SQ.Location), SQ.Location
       'If you find the second occurence add the first occurence
       If Not RepeatedDataDict.Exists(NonRepeatedDataDict(strKey)) Then
         RepeatedDataDict.Add NonRepeatedDataDict(strKey), SQ.Location
       End If
     End If
   Next I
  
   For Each dictItm In RepeatedDataDict
      strLocation = dictItm
      RepeatedSequences.Add strLocation, allSequences.Item(CStr(dictItm)).Sequence
   Next
  Set getRepeatedSequences = RepeatedSequences
  Exit Function
errlbl:
    MsgBox Err.Number & " " & Err.Description
End Function
Public Function GetAllRepeatedSequences(CIs As ColumnItems) As Sequences
  Dim SQs As New Sequences
  Dim tempSQs As Sequences
  Dim SQ As Sequence
  Dim sequenceLength As Integer
  Dim I As Integer
  sequenceLength = 2
  Do
    Set tempSQs = getRepeatedSequences(CIs, sequenceLength)
    For I = 1 To tempSQs.Count
      Set SQ = tempSQs.Item(I)
       SQs.Add SQ.Location & " (" & SQ.sequenceLength & ")", SQ.Sequence, CIs.ColumnName
    Next I
    sequenceLength = sequenceLength + 1
  Loop Until tempSQs.Count = 0
 SQs.ColumnName = CIs.ColumnName
 Set GetAllRepeatedSequences = SQs
End Function
Public Function GetColumn(QueryName As String, ColumnNumber As Integer, Optional LocationColumnNumber As Integer = 0) As ColumnItems
 'One column has to have a location such as a primary key
  Dim col As New ColumnItems
  Dim rs As DAO.Recordset
  Dim locationFld As String
  Dim columnFld As String
  Dim strSql As String
  'get the names of the fields
  Set rs = CurrentDb.OpenRecordset(QueryName)
  locationFld = rs.Fields(LocationColumnNumber).Name
  columnFld = rs.Fields(ColumnNumber).Name
  rs.Close
  'load the collection
  strSql = "Select " & locationFld & " AS Location, " & columnFld & " FROM " & QueryName & " WHERE NOT " & columnFld & " IS NULL"
  ' Debug.Print strSql
  Set rs = CurrentDb.OpenRecordset(strSql)
  Do While Not rs.EOF
    col.Add rs!Location, rs.Fields(columnFld).Value, rs.Fields(1).Name
    rs.MoveNext
  Loop
  rs.Close
  Set GetColumn = col
End Function
Public Function GetAllSequencesForAllColumns(QueryName As String) As Sequences
'You have to push in a query where the first column is a location
'All subsequent columns are returned
Dim rs As DAO.Recordset
Dim allSequences As New Sequences
Dim tempSequences As Sequences
Dim I As Integer
Dim j As Integer
Dim ts As Sequence
Set rs = CurrentDb.OpenRecordset(QueryName)
For I = 1 To rs.Fields.Count - 1
  Set tempSequences = New Sequences
  Set tempSequences = GetAllRepeatedSequences(GetColumn(QueryName, I))
  For j = 1 To tempSequences.Count
    Set ts = tempSequences.Item(j)
    allSequences.Add ts.ColumnName & " " & ts.Location, ts.Sequence, ts.ColumnName
  Next j
Next I
Set GetAllSequencesForAllColumns = allSequences
End Function

Public Sub GetAll()
  Dim allSequences As Sequences
  Dim I As Integer
  Set allSequences = GetAllSequencesForAllColumns("tblData")
  For I = 1 To allSequences.Count
    Debug.Print allSequences.Item(I).ToString
  Next I
End Sub
Normally I would wrap all of this into one big class and hide all the working from the user. I did not do that yet. But my final solution would simply be a class the encapsulates all of this.
 
I've spent the last 3-4 hours with it. No doubt on the logic and the perfect structure used, but I'm an Access 2003 user and besides the "User-defined type not defined" and other errors I get, I'm not so familiar with classes and etc. mentioned.

So, MajP's approach is a star, but I can't really implement it into my current level of programming...
Anyway we could go a bit back and implement a function that searches a comma delimited string of numbers and returns in a table the repeated patterns, as specified by X?
i.e.
strNumbers = "11,3,3,14,8,5,13,34,4,4,4,4,8,17,13,34,4,4,11"
We get:
X PATTERN Repeats
2 13,34 2
2 4,4 3
3 13,34,4 2

What do you think?
 
The user defined type is my fault. You have to add a reference to microsoft scripting runtime in order to use the dictionary.

Here is the link to the demo database



OK I added a tbl called tblOutPut.
I added two procedures to write to the table. It deletes everything out of it first.

Code:
Public Sub LoadTableWithSequences(qryName As String)
  Dim allSequences As Sequences
  Dim I As Integer
  Dim location As String
  Dim columnName As String
  Dim sequence As String
  Dim sequencelength As String
  Dim SI As sequence
  Dim strSql As String
  Set allSequences = GetAllSequencesForAllColumns(qryName)
  strSql = "Delete * from tblOutPut"
  CurrentDb.Execute strSql
  For I = 1 To allSequences.Count
   Set SI = allSequences.Item(I)
   columnName = "'" & SI.columnName & "'"
   location = "'" & SI.location & "'"
   sequencelength = SI.sequencelength
   sequence = "'" & SI.sequence & "'"
   strSql = "INSERT INTO tblOutPut ( ColumnName, Location, Sequence, SequenceLength ) VALUES (" & columnName & ", " & location & ", " & sequence & ", " & sequencelength & ")"
   CurrentDb.Execute strSql
  Next I
End Sub

Public Sub LoadFromString(strNumbers)
  Dim aNumbers() As String
  Dim CI As New ColumnItems
  Dim I As Integer
  Dim allSequences As Sequences
  Dim location As String
  Dim columnName As String
  Dim sequence As String
  Dim sequencelength As String
  Dim SI As sequence
  Dim strSql As String
  
  aNumbers = Split(strNumbers, ",")
  For I = LBound(aNumbers) To UBound(aNumbers)
    CI.Add I, aNumbers(I), "No ColumnName"
  Next I
  

  
  Set allSequences = GetAllRepeatedSequences(CI)
  strSql = "Delete * from tblOutPut"
  CurrentDb.Execute strSql
  
  For I = 1 To allSequences.Count
   Set SI = allSequences.Item(I)
   columnName = "'" & SI.columnName & "'"
   location = "'" & SI.location & "'"
   sequencelength = SI.sequencelength
   sequence = "'" & SI.sequence & "'"
   strSql = "INSERT INTO tblOutPut ( ColumnName, Location, Sequence, SequenceLength ) VALUES (" & columnName & ", " & location & ", " & sequence & ", " & sequencelength & ")"
   CurrentDb.Execute strSql
  Next I
End Sub


Although there is a whole lot of code you only need to pass either a string or a name of the query to one of the functions. Everything else I could wrap in an mde and have hidden from view. That is the idea behind object oriented programming. It should be encapsulated in the classes .

Below is now the only code you should worry about. If you pass in a query or table it looks at all fields. Or you can pass a string.

Code:
Public Sub TestTable()
  LoadTableWithSequences ("tblData")
End Sub

Public Sub TestString()
  Dim strNumbers As String
  strNumbers = "11,3,3,14,8,5,13,34,4,4,4,4,8,17,13,34,4,4,11"
  LoadFromString (strNumbers)
End Sub


If I run it on your string I get the following in my table
Code:
ColumnName	Location	Sequence   SequenceLength
No ColumnName	14 (2)	        13,34	      2
No ColumnName	6  (2)          13,34	      2
No ColumnName	15 (2)          34,4	      2
No ColumnName	7  (2)          34,4	      2
No ColumnName	14 (3)          13,34,4	      3
No ColumnName	6  (3)          13,34,4	      3
No ColumnName	15 (3)          34,4,4	      3
No ColumnName	7  (3)          34,4,4	      3
No ColumnName	14 (4)          13,34,4,4     4
No ColumnName	6  (4)          13,34,4,4     4


Just look at mdlMain. Those are the only two procedures to concern with.
 
Can't get it...
I get error 3265 "Item not found in this collection" and hangs at: Public Function GetColumn(QueryName As String, ColumnNumber As Integer, Optional LocationColumnNumber As Integer = 0) As ColumnItems
Code:
Do While Not rs.EOF
[COLOR=red] col.Add rs!location, rs.Fields(columnFld).Value, rs.Fields(1).Name [/color]
    rs.MoveNext
  Loop
The only difference among your input table and mine is that mine
has numbers as column names. i.e. your [ColOne] in mine is [1].
Is this the reason?
 
Try changing the table I put in there "tblData" to match your table. I did not try it but I should have wrapped my column names in [] because this may be an issue. If the user has field names with spaces or reserved words it could fail.

If by changing the names it works, you may be able to fix it with this line of code in the same function
after
columnFld = rs.Fields(ColumnNumber).Name
add
columnFld = "[" & columnFld & "]
 
I made the change, Cool!
I have to say it's lighting fast! I fed 75 rows by 45 columns and it was ready as soon as I hit [enter]. Great MajP!

Now, would you please satisfy my curiosity and help me develop my programming with the function that searches a comma delimited string of numbers and returns in a table the repeated patterns, as specified by X?
i.e. strNumbers = 11,3,3,14,8,5,13,34,4,4,4,4,8,17,13,34,4,4,11"
we get:
X PATTERN Repeats
2 13,34 2
2 4,4 3
3 13,34,4 2
...

Don't spent any time with the output table, debug print is fine, it's the way of parsing the string that bugs me... days now...

Much, very much obliged!

 
I tested it with 5 rows and 1000 random integers. It returned about 700 answers very quickly. Faster then I would have expected, because the logic was designed to be compartmentalized and reuseable, and not optimized.

The code I wrote was based on the idea, that the data was stored in a table in columnar order. So my solution for a string was to force it back into a column. However, there is a whole lot more code to handle multiple columns etc. Store the found sequences, put them all together from all the columns, and then print it out.

The bottom line you need to first find the sequence and then determine if the sequence is a duplicate.



If you look at the getsequence function the logic would be the same if it is a string or column.

Pass in sequence length. Start at 2 and add 1 until no repeated sequences found
11,3,3,14,8,5,13,34,4,4,4,4,8,17,13,34,4,4,11
for each itm in string
read item
for i = itm position to itm position + sequencelength
sequence = sequence & itm(i)
next
next itm

You read 11 then go out 2
11,3,3
then read 3 and go out 2
3,3,14
...
when you get to second to last item you cannot read out two more
so you have to account for that somehow if not you error out.

4,11

If you can digest this function and focus on the logic used and not the data structures you could modify it to return the sequences from a string.

It is hard enough just to create all the sequences. So I first just focus on getting and saving all the sequences. Then it is fed to another function to find which are duplicates.

Code:
Public Function getSequences(CIs As ColumnItems, sequenceLength As Integer) As Sequences
  Dim AllSQs As New Sequences
  Dim ConcatenatedKey As String
  Dim I As Integer
  Dim j As Integer
  Dim aItms() As String
  
  On Error GoTo errlbl
  
  For I = 1 To CIs.Count
    j = 1
    ConcatenatedKey = CStr(CIs.Item(I).ItemValue)
    Do While I + j < CIs.Count + 1 And j < sequenceLength
      If Not CIs.Item(I).ItemValue = CIs.Item(I + j).ItemValue Then
        ConcatenatedKey = ConcatenatedKey & "," & CStr(CIs.Item(I + j).ItemValue)
      End If
      j = j + 1
    Loop
    aItms = Split(ConcatenatedKey, ",")
    'This accounts for the end of the list where you add less then the sequence length
    If UBound(aItms) = sequenceLength - 1 Then
       AllSQs.Add CIs.Item(I).Location, ConcatenatedKey, CIs.Item(I).ColumnName
    End If
  Next I
  AllSQs.ColumnName = CIs.ColumnName
  Set getSequences = AllSQs
  Exit Function
errlbl:
    MsgBox Err.Number & " " & Err.Description
End Function
 
BTW, I did not try to optimize this, in fact there is a lot of repetition. But I did choose a Dictionary for a reason. In order to find duplicates there is a lot of searching. If you have an array or collection to search it to see if sequence 1,5,6 exists you have to start at 1 and enumerate the whole list. This can take N iterations. With a dictionary you can immediately check if it contains the key "1,5,6". Which take 1 iteraion Since you are doing this for each possible sequence this greatly speeds your search.
 
This may help. I still broke it down into two functions. I could have worked directly with the string and used string functions. For me I decided to split it

The first simply is passed a string and puts it into a colection of all the sequences
Code:
Public Function GetSequenceFromString(strList As String, Optional LengthSequence = 2, Optional seperator As String = ",") As Collection
  On Error GoTo errLbl
  Dim colAll As New Collection
  Dim I As Integer
  Dim J As Integer
  Dim aStr() As String
  Dim strSequence As String
  aStr = Split(strList, seperator)
  For I = LBound(aStr) To UBound(aStr) - LengthSequence + 1
    For J = 0 To LengthSequence - 1
      strSequence = strSequence & " " & aStr(I + J)
    Next J
    strSequence = Trim(strSequence)
    colAll.Add strSequence, CStr(I + 1)
    strSequence = ""
  Next I
  Set GetSequenceFromString = colAll
  Exit Function
errLbl:
     MsgBox Err.Number & Err.Description
  End Function

Once you have the sequences. Pass it to this function.
It takes the first item compares it to the second item all the way to the last item.
if you find a match
First time you have to add both the one you are comparing and the one found
if you find a third or fourth .. match you only want to add the newly found match

After comparing the first to all items greater, compare the second to all items greater. You do not have to compare it against the first. You only have to do this until the midpoint of the list. At that time you compared each item to every other item.

Code:
Public Function GetDuplicateSequences(Sequences As Collection) As Collection
  Dim colDuplicates As New Collection
  Dim I As Integer
  Dim J As Integer
  Dim firstFound As Boolean
  Dim strSequence As String
  For I = 1 To Fix(Sequences.Count / 2) + Sequences.Count Mod 2
   strSequence = Sequences(I)
    firstFound = False
    
    For J = I + 1 To Sequences.Count
      If Sequences(I) = Sequences(J) Then
        'First time add both
        If Not firstFound Then
          colDuplicates.Add Sequences(J) & "  Location: " & I
          colDuplicates.Add Sequences(J) & "  Location: " & J
          firstFound = True
        Else
          colDuplicates.Add Sequences(J) & "  Location: " & J
        End If
      End If
    Next J
  Next I
  Set GetDuplicateSequences = colDuplicates
End Function
test
Code:
Public Sub TestString()
  Dim strNumbers As String
  Dim Sequences As Collection
  Dim I As Integer
  strNumbers = "4,11,3,3,14,8,5,13,34,4,4,4,4,8,17,13,34,4,4,11,3"
  Set Sequences = GetSequenceFromString(strNumbers, 3)
  Set Sequences = GetDuplicateSequences(Sequences)
  For I = 1 To Sequences.Count
    Debug.Print Sequences(I)
  Next
End Sub

This can be optimized a bit, but I did not show that. Here is the list of all sequences of three.
once you compare the first sequence 4,11,3 you find a match (*). The next step is to compare 11 3 3 against everything later than 11 3 3 including the record with a *. I could have removed * when I found it making my search shorter. If the list was very large with lots of matches that could make a difference.

4 11 3
11 3 3
3 3 14
3 14 8
14 8 5
8 5 13
5 13 34
13 34 4
34 4 4
4 4 4
4 4 4
4 4 8
4 8 17
8 17 13
17 13 34
13 34 4
34 4 4
4 4 11
4 11 3 *
 
That's a great help and a lot of good homework... it does help to break it down to two functions!

Thank you MajP, once again! [thumbsup2]
 
For a SQL solution, I'd probably re-arrange your starting point to NOT be a crosstab and have the [3] and [35] columns as values in a column:

Code:
ID   Code  Count
140    3    11 
141    3    12
140   35     6
141   35     7
142    3    11
143    3    12
142   35     6
143   35     7

Then do a crosstab joining this query to itself with a criteria that A.Code=B.Code and Where the Rank of B.ID (count the IDs between B.ID and A.ID for the same code) is between 1 and X. This should include for a crosstab row the A.ID and the next X-1 IDs. Use the Rank you generate as the crosstab heading prefixd with a character like R1, R2 (makes it easier to work with the columns). Output should look like this:

Code:
ID  Code  R1  R2
140  35    6   7  (R1 from ID 140, R2 from ID 141)
140   3   11  12  ("")
141   3   12  11  (R1 from ID 141, R2 from ID 142)
141  35    7   6  ("")

Finally, do a group-by query on the crosstab query of the code and columns R1, R2, etc with count > 1 to find the sequential repeats. If you want the IDs for each repeat, join back to the crosstab.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top