Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Scrubbing bad data in .DAT files (Loop)

Status
Not open for further replies.

MrTrue

Technical User
Jul 28, 2008
46
US
Ok, here is the issue I've encountered... I've been using some DAT files to track some user usage information. Due to an incorrect input, some improperly formatted data has been inserted into my files.

I need to loop through all the DAT files search them line by line and make corrections to the incorrect input.

Here is a sample of the data
'--------------------------------------------------------------
"05006","15:10:09","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006","15:28:57","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006","15:29:08","04/08/2011","R983-Pred"
"05006","15:29:12","04/08/2011","R983-Pred"
"05006","15:31:05","04/08/2011","R983-Pred"
"05006","15:31:06","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006",#1899-12-30 15:35:05#,#2011-04-08#,"Ortho PRE-D"
"05006","15:37:57","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006","15:40:30","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006","15:40:47","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006","15:43:22","04/08/2011","Pre-D_CYM Remark(Year only)"
"05006","15:44:45","04/08/2011","R983-Pred"
'--------------------------------------------------------------

If you notice, line number 7 has an incorrect date and time format (fields 2 & 3)

I use a vbs file to compile the DAT files into a single file for some analysis I do, and I'm assuming I could use the structure to loop through each line and scrub this data, but I'm stumped as to how I would proceed from there, Should I be using a regex function just to search each file for all instances in that pattern and replace? Does anyone have any example of how to do this? The VBS loop I was trying to modify is below... Thanks in advance for reviewing this!

Code:
Public Sub AppendFiles()

      Dim SourceNum As Integer
      Dim DestNum As Integer
      Dim Temp As String
      Dim sOriginFolder As String
      Dim sDestinationFile As String
      Dim sFile, sFile2, oFSO
      Set oFSO = CreateObject("Scripting.FileSystemObject")
      
On Error GoTo ErrHandler

       sOriginFolder = "S:\Dental Claims Lmtd Access\DataTrack\DAT FILES"
       sDestinationFile = "S:\Dental Claims Lmtd Access\DataTrack\WordTemp\MacroData.dat"
      
      
      For Each sFile In oFSO.GetFolder(sOriginFolder).Files

      DestNum = FreeFile()
      Open sDestinationFile For Append As DestNum

      SourceNum = FreeFile()
      Open sFile For Input As SourceNum

      Do While Not EOF(SourceNum)
         Line Input #SourceNum, Temp
         Print #DestNum, Temp
      Loop

CloseFiles:

      ' Close the destination file and the source file.
      Close #DestNum
      Close #SourceNum

Next


Exit Sub

ErrHandler:

      'MsgBox "Error # " & Err & ": " & Error(Err)
      Resume CloseFiles

End Sub
 
Yes I would recommend using regex, however, I'm not familiar enough with the syntax to be giving advice. Given the DAT file from above, this theoretically should work.

Code:
set objFSO = WScript.CreateObject("Scripting.FileSystemObject")
set objDAT = objFSO.OpenTextFile("file.dat", 1, true 0)
set objOutput = objFSO.OpenTextFile("output.dat", 2, true, 0)

do while not (objDAT.AtEndOfStream)
   strLine = objDAT.readline
   if (inStr(strLine, """,#")) then  'look for ",#
      'don't output the line
   else
      'output the line
      objOutput.WriteLine strLine
   end if
loop

objDAT.close
objOutput.close

-Geates

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live."
- Martin Golding

"There are seldom good technological solutions to behavioral problems."
- Ed Crowley, Exchange guru and technology curmudgeon
 
Are you looking to fix the data that is wrong? How consistent is the data, does it vary greatly from what you posted?

If you are looking to fix the data, Regular Expressions sounds like the best way to go, though I also can't really help you with that. The tricky part is extracting data from the 2nd and 3rd fields. There are a few ways to go; Since the dat file really appears to be a CSV (comma-delimited) file, you can open it up as a recordset and extract the fields.. then fix what needs to be fixed. Geates' solution is fine as well, you will just have more string manipulation to do. Functions like Left, Mid, Right, InStr will all help.

Also note that there is a separate forum for VB6: forum222
The syntax is similar to VBScript, but I recommend also posting there (and referring to this thread by pasting the text [ignore]thread329-1644999[/ignore] in the new thread), as someone there may have a solution more specific to VB6.
 
Here is a regex solution if anyone is still interested. I created a txt file from the example data and named it 'data_to_scrub.txt', change the reference in the code to something appropriate for you. The script works for the example data pattern given; if there is other 'bad' data that does not follow this pattern, the script will have to be tweaked.

No warranties or guarantees, use at your own risk, blah blah blah

Code:
[COLOR=green]'[URL unfurl="true"]http://msdn.microsoft.com/en-us/library/ms974570.aspx[/URL]
'[URL unfurl="true"]http://msdn.microsoft.com/en-us/library/yab2dx62.aspx[/URL]
'[URL unfurl="true"]http://www.regular-expressions.info/index.html[/URL]
'[URL unfurl="true"]http://msdn.microsoft.com/en-us/library/6wzad2b2(v=vs.85).aspx[/URL][/color]

Set objFSO = CreateObject("Scripting.FileSystemObject")

dim filetxt
Const ForReading = 1, ForWriting = 2, ForAppending = 8
Const fmtUnicode = -1, fmtASCII = 0, fmtDefault = -2

set re = new regexp
dim strInput
dim Matches
dim Match


re.IgnoreCase = true

[COLOR=green]'change the following file path to point to your file[/color]
Set filetxt = objFSO.OpenTextFile("H:\data_to_scrub.txt", ForReading, False, fmtDefault)

Dim arrFileLines()
i = 0
Do Until filetxt.AtEndOfStream
Redim Preserve arrFileLines(i)
arrFileLines(i) = filetxt.ReadLine

strInput = arrFileLines(i)
[COLOR=green]'wscript.echo(strInput)

'create a pattern to find incorrect data such as #1899-12-30 15:35:05#
'group items of interest with parentheses
'group 1 is the entire expression, group 2 is the timestamp portion
'ie <group1> <group2> </group2> </group1>[/color]
re.Pattern = "(#\d{4}-\d{2}-\d{2} (\d{2}:\d{2}:\d{2})#)"
if re.Test(strInput) then
	[COLOR=green]'if the input matches the pattern, replace the matching portion with group 2 and enclose group 2 with quotes[/color]
	strOutput = re.Replace(strInput, """$2""")
	wscript.echo("Step 1: scrub the timestamp" & vbcrlf & strInput & vbcrlf & strOutput)
	strInput = strOutput
end if
[COLOR=green]
'create a pattern to find incorrect data such as #2011-04-08#
'group items of interest with parentheses
'group 1 is the entire expression, group 2 is the year, group 3 is the month, group 4 is the day[/color]
re.Pattern = "(#(\d{4})-(\d{2})-(\d{2})#)"
if re.Test(strInput) then
	[COLOR=green]'if the input matches the pattern, rearrange the data from YYYY/MM/DD to MM/DD/YYYY and surround the date with quotes[/color]
	strOutput = re.Replace(strInput, """$3" & "/" & "$4" & "/" & "$2""")
	wscript.echo("Step 2: rearrage the date format" & vbcrlf & strInput & vbcrlf & strOutput)
	strInput = strOutput
end if
[COLOR=green]
'write strInput to file[/color]

i = i + 1
Loop
filetxt.Close
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top