Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

VB Text Comparison

Status
Not open for further replies.

TrevHatib

Programmer
Nov 27, 2000
6
GB
I need to be able to read in two strings and output a third to a RTF control which shows the differences. This is to be able check the changes made to documents. Essentially I'd like the rtf text to look like Word's 'Compare Documents'. Unfortunately the user does not have Word so I cannot use automation.

Any ideas please? Presumably there's an algorithm to do this but I don't really have the time to re-invent the wheel. Alternatively are there any freeware programs that I could call in the background to output the differences into a text file which can then be pulled into the rtf control?
 
Trev,

This is, in my opinion, a large scale undertaking. It is easy to identify the first occurrence of a differene. After that, it becomes more complex. The "program" needs to find out where the changed version re-syncs with the original. This can become a rather involved exercise. I believe that Word (and other commercial word processors) only do this in a special mode, where they keep track of individual keystrokes and insert special codes into the edited document to mark the change. Even this arrangement noticably slow down the program response when in use.

There are some UNIX system functions which do comparisions, and I would think someone has ported this to Ms., but probably in C or C++.



MichaelRed
mred@duvallgroup.com
There is never time to do it right but there is always time to do it over
 
This will take a little work but the basics are all here.
Save a file as "File1.txt", add some lines, delete some lines and save it as "File2.txt".
Create a form and add three Rich Text controls. Call them RTF1, RTF2 and RTF3. Add two command buttons.

The following code will compare the contents of RTF1 and RTF2, pausing whenever it finds a difference and writing a notation in RTF3 with the line number and the contents of the discrepant line.

Hope this helps....
[tt]
Dim GoAgain As Boolean

Private Sub Form_Load()
RTF1.LoadFile "File1.txt"
RTF2.LoadFile "File2.txt"
RTF1.SelStart = 0
RTF2.SelStart = 0
Command1.Caption = "Search for Changes"
Command2.Caption = "Click to Continue"
End Sub

Private Sub Command1_Click()
Do
RTF1.Span vbCrLf, True, True
Srch$ = RTF1.SelText
Fp = RTF2.Find(Srch$, 0)
lineNo = RTF1.GetLineFromChar(RTF1.SelStart)
GotError = False
If Fp = -1 Then
If Trim$(Srch$) <> &quot;&quot; Then
RTF3.Text = RTF3.Text &amp; vbTab _
&amp; &quot;****Deleted from File#1 - Line #: &quot; _
&amp; Str$(lineNo) &amp; &quot;****&quot; _
&amp; vbCrLf &amp; Srch$ &amp; vbCrLf
GotError = True
End If
RTF1.SetFocus
RTF2.SelLength = 0
End If
NextPos = RTF1.SelStart + RTF1.SelLength + 2
If NextPos >= Len(RTF1.Text) Then Exit Do
GoAgain = False
If GotError = True Then
Do
DoEvents
If GoAgain = True Then Exit Do
Loop
End If
RTF1.SelStart = NextPos
Loop
RTF1.SelStart = 0
RTF1.SelLength = 0
RTF2.SelStart = 0
RTF2.SelLength = 0
Do
RTF2.Span vbCrLf, True, True
Srch$ = RTF2.SelText
Fp = RTF1.Find(Srch$, 0)
lineNo = RTF2.GetLineFromChar(RTF2.SelStart)
GotError = False
If Fp = -1 Then
If Trim$(Srch$) <> &quot;&quot; Then
RTF3.Text = RTF3.Text &amp; vbTab _
&amp; &quot;****Added to File#2 - Line #: &quot; _
&amp; Str$(lineNo) &amp; &quot;****&quot; &amp; vbCrLf _
&amp; Srch$ &amp; vbCrLf
GotError = True
End If
RTF2.SetFocus
RTF1.SelLength = 0
End If
NextPos = RTF2.SelStart + RTF2.SelLength + 2
If NextPos >= Len(RTF2.Text) Then Exit Do
GoAgain = False
If GotError = True Then
Do
DoEvents
If GoAgain = True Then Exit Do
Loop
End If
RTF2.SelStart = NextPos
Loop
MsgBox &quot;Finished&quot;
End Sub

Private Sub Command2_Click()
GoAgain = True
End Sub
[/tt]

VCA.gif

Alt255@Vorpalcom.Intranets.com

&quot;What this country needs is more free speech worth listening to.&quot;[tt]
Hansell B. Duckett[/tt]​
 
Alt255,

There's nothing your code but it doesn't do what I want.
To get the difference between two string like &quot;Hello World&quot; and &quot;Hello the World&quot; it's not as simple as marking on as deleted and the other as new - instead we'd want to see :

Hello <new> the <end_new> World

where the <new> and <end_new> tags would enable us to highlight the added text (or whatever). The checking needs to be done by both string and so it's probably going to involve some brain numbing recursion (which I haven't done since college days.) The Diff program uses a Longest Common Subsequence algorithm - the source code is available in Perl but it's not the easiest script to get to grips with.



Trev
 
As I noted, those are only the basics. &quot;Deleted&quot; and &quot;Added&quot; where the wrong words to use. When they show up in RTF3;
[tab]Deleted From File#1 - Line #10
She said he was nuts, so I said &quot;Why?&quot;
[tab]Added to File#2 - Line #10
She said he was nuts, so I said &quot;Why is that?&quot;

it only means that there was a change in line #10. Your program will have to determine what was changed in line #10 so it can report
She said he was nuts, so I said &quot;Why <new>is that<end new>?&quot;


I'm not going to write that part for you.

The task doesn't seem insurmountable... instead of writing the changes to RTF3, place them in an array and find the differences.
Deleted lines will be found in the first Do/Loop. Added lines will be found in the second Do/Loop. Changed lines found in the first Do/Loop will show corresponding changes in the second Do/Loop. Match them up.

I don't think recursion is needed here. The code I provided finds the differences between two text files. All you have to do is find a way to interpret the differences.
VCA.gif

Alt255@Vorpalcom.Intranets.com

&quot;What this country needs is more free speech worth listening to.&quot;[tt]
Hansell B. Duckett[/tt]​
 
Vist for full documentation and a trial download. I am the author. It is line, not word, oriented but it should not be difficult to break the words into records and then reassemble. It may be stretch but it will only cost a little &quot;eyeballin&quot; to check. Frankly, I had not thought of word comparison as an option.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top