Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to write unique records to a text file 1

Status
Not open for further replies.

victora

Programmer
Feb 11, 2002
118
0
0
US
I just started with VB 6.0 couple of weeks ago. I have this problem of creating a text file based on a raw data from one of our vendors. My input is a text file with duplicate cert#. How can I create a new text file with only the unique cert#?

Thank you very much for all your help.
 
For this you should know how to work with files, for loops and arrays, because all that you have to do is keep an array of cert#'s that you have already encountered and compare each new value with those in the array. If the value has not been encountered, write the new cert# to file, put it into the array and move on. Otherwise, ignore the value.
-Max
 
You could always use the dictionary object. My example uses basic file I/O and the dictionary object which is basically an associative array. I would recommend buying a book and researching the following code.

' Add a reference to Microsoft Scripting Runtime
Dim Dict As Dictionary
Dim InputData As String
Dim Counter As Long
Dim UniqueCounter As Long

Private Sub Command1_Click()
' Opens files for processing
Open "C:\Dupe-certs.txt" For Input As #1
Open "C:\Unique-certs.txt" For Output As #2

' Initializes the dictionary and counters
Set Dict = New Dictionary
Counter = 0
UniqueCounter = 0

' Loops through the file
Do
Line Input #1, InputData
Counter = Counter + 1
' Finds mismatches and adds to dictionary and output file if unique
If Dict.Exists(InputData) = False Then
UniqueCounter = UniqueCounter + 1
Print #2, InputData
Dict.Add InputData, UniqueCounter
End If
Loop Until EOF(1)

' Closes files and destroys objects from memory
Close
Set Dict = Nothing

' Promps user with completion
MsgBox Counter & " Cert#'s Read" & vbNewLine & _
UniqueCounter & " Unique Cert#'s found", vbInformation
End Sub

Swi
 
I'm proven wrong once again - dictionary is a much better idea - good job Swi.
-Max
 
Thanks. I stumbled across it about 2 years ago. Although, victoria would also benefit greatly by taking your advice and reasearching other various ways to accomplish the same goal since she is just starting out.

Swi
 
Depending on how the data is stored in the file, you may be able to use ADO to open the file and select the distinct records.

will assist you in creating a connection string for text files.

Here is some sample code to get you started.

Code:
Dim DB As ADODB.Connection
Dim RS As ADODB.Recordset

Set DB = CreateObject("ADODB.Connection")
DB.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\;Extended Properties=""text;HDR=Yes;FMT=Delimited"""
Call DB.Open

Set RS = CreateObject("ADODB.Recordset")
Call RS.Open("Select Distinct * From Certs.txt", DB, adOpenStatic, adLockBatchOptimistic)
Call MsgBox("Distinct Records:  " & RS.RecordCount)
RS.Close
Set RS = Nothing

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
DISTINCT can be hazardous unless your duplicate records are complete duplicates... or you really want records with the same cert# but otherwise different to be delivered to the output.

modMain.bas
Code:
Option Explicit
'Add a reference to Microsoft ActiveX Data Objects Library.

'Text files are in the same directory as the program.
'Substitute the program's actual Path for "$Path$" at
'runtime.

Const strTextConn As String = _
    "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "  Data Source=""$Path$"";" & _
    "  Extended Properties = ""Text;HDR=NO;FMT=Delimited"""
Const strDistinctSQL As String = _
    "SELECT DISTINCT *" & _
    "  INTO [distinct.txt]" & _
    "  FROM [origdata.txt]"
Const strFirstSQL As String = _
    "SELECT [Cert]," & _
    "       First([Name]) AS [Name]," & _
    "       First([Date]) AS [Date]," & _
    "       First([Count]) AS [Count]" & _
    "  INTO [first.txt]" & _
    "  FROM [origdata.txt]" & _
    "  GROUP BY [Cert]"

Function InsPath(ByVal S As String) As String
    InsPath = Replace$(S, "$Path$", App.Path)
End Function

Sub Main()
    Dim objCmd As New ADODB.Command
    Dim lngDistinctRecords As Long
    Dim lngFirstRecords As Long
    
    'Set connection string.
    objCmd.ActiveConnection = InsPath(strTextConn)
    
    'Peform DISTINCT query.
    objCmd.CommandText = strDistinctSQL
    objCmd.Execute lngDistinctRecords, _
                   , _
                   adCmdText Or adExecuteNoRecords
                   
    'Perform Aggregate ("first") query.
    objCmd.CommandText = strFirstSQL
    objCmd.Execute lngFirstRecords, _
                   , _
                   adCmdText Or adExecuteNoRecords
    
    MsgBox "DISTINCT resulted in " & _
           CStr(lngDistinctRecords) & _
           " records." & vbNewLine & _
           "First() resulted in " & _
           CStr(lngFirstRecords) & _
           " records.", _
           vbOKOnly, _
           "Filter textfile records via Jet SQL"
End Sub

I supplement this with a Jet text schema file placed into the same directory as the data files (which is also the directory containing my VB EXE in this case):

schema.ini
Code:
[distinct.txt]
ColNameHeader=False
Format=CSVDelimited
DateTimeFormat=d-mmm-yyyy
TextDelimiter=none
Col1=Cert Char Width 255
Col2=Name Char Width 255
Col3=Date Date
Col4=Count Integer

[first.txt]
ColNameHeader=False
Format=CSVDelimited
DateTimeFormat=d-mmm-yyyy
TextDelimiter=none
Col1=Cert Char Width 255
Col2=Name Char Width 255
Col3=Date Date
Col4=Count Integer

[origdata.txt]
ColNameHeader=False
Format=CSVDelimited
DateTimeFormat=d-mmm-yyyy
TextDelimiter=none
Col1=Cert Char Width 255
Col2=Name Char Width 255
Col3=Date Date
Col4=Count Integer

A short sample file in this layout is here:

origdata.txt
Code:
01234567,Bob,11-Jan-1997,345
07654321,Fred,3-Dec-1998,2
12345678,Ted,21-Nov-1999,600
12345678,Ted,21-Nov-1999,33
87654321,Jed,21-Nov-1999,182
00001111,Ned,1-Aug-2000,32
00001111,Ned,1-Aug-2000,32

Using the two techniques gives me differing results:

distinct.txt
Code:
00001111,Ned,1-Aug-2000,32
01234567,Bob,11-Jan-1997,345
07654321,Fred,3-Dec-1998,2
12345678,Ted,21-Nov-1999,33
12345678,Ted,21-Nov-1999,600
87654321,Jed,21-Nov-1999,182

Using DISTINCT will only "collapse" duplicate records that are 100% duplicates. Thus both cert# 12345678 records here get sent to the output. Again, this might be desireable behavior in your application.

A SELECT DISTINCT query can be "neater" because you don't need to enumerate each field you want in the output file. Of course if you only want a subset of the original fields you'll end up naming them anyway.

first.txt
Code:
00001111,Ned,1-Aug-2000,32
01234567,Bob,11-Jan-1997,345
07654321,Fred,3-Dec-1998,2
12345678,Ted,21-Nov-1999,600
87654321,Jed,21-Nov-1999,182

Using GROUP BY combined with First() allows me to filter out duplicate cert#s, though the rest of the values sent to the output are from whatever record was "first" internally to Jet after grouping. This can be controlled somewhat by adding an ORDER BY clause as well though.


Jet's text processing features can come in handy for this kind of file processing. Often there is a lot less coding to write correctly, and the speed is good even on fairly large files.
 
Thanks to everybody who replied to my post.

Especial mention to Swi. YOU ARE THE MAN!

The program works like a charm. I checked the output file and sure thing! No more duplicates!

Again thank you
 
So give him a star. Just click "Thank swi for this valuable post." :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top