Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparing strings, finding differences 2

Status
Not open for further replies.

hotmailisforloosers

Programmer
Jun 22, 2004
25
0
0
US
Hi, I've got a database of articles that are being manipulated through an ASP page using VBscript. When someone updates an article the administrator can go to a page where the old article and the updated version line up side by side. What I need to be able to do is find the differences in those strings and highlight them on the page.

So can someone help me?... Especially with finding differences in the string. All I know how to do right now is something like: MyComp = StrComp(R1_old, R1_new, 1) to tell me if the whole string is different....... but how can I find each individual difference inside a string?
 
Take a look at the Mid and Len functions and the For .. Next instruction.

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Thanks, that's a starting point at least. I'll probably be back with more questions though, since I am very green with this stuff.
 
Okay, I have another question already.... can I use this still as a string?... or does it have to be converted to an array to be able to compare and highlight specific parts?
 
Here an example to compare the 5th character in each strings:
i = 5
If Mid(R1_old, i, 1) <> Mid(R1_new, i, 1) Then
MsgBox "Char#" & i & " differ"
End If

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
What you are thinking about doing is quite complex. Let's say that these two strings need to be compared:
str1 = "THis is string one."
str2 = "This is 2.

What do you highlight? do you consider 'THis' to be equal to 'This'? If you start looking at the sequence of words, what does the missing 'string' in str2 do to the sequence. These are very simple strings. It sounds like you are talking about much more complicated ones.

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
PHV: Thanks, I'll play around with that and see what I'm able to do. That's definitely a help to get started.

Tom: Yeah, I know it's pretty complex. What I would like to be able to do is highlight "string one" on the first and "2" on the second, but I'll take whatever I can get. If the capitolization gets highlighted too then that's fine.

I'm just trying to take this one itty bitty step at a time. I definitely realize it's just going to get harder, but I really don't want to think about that right now :)
 
One way that you may want to look at it is to find a way to determine 'what do I need to do to string one to make it equal string two?' If you can find a way to answer this programmatically then you are done.

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
There is no spoon"- got it :)...... no, I think I understand what you're saying. Thanks.
 
Okay, I've run into a bit of trouble here. I've got a very basic code comparison going on where it highlights the areas that are the same with cyan, and the areas that are different with pink..... the problem right now is that somehow a space keeps showing up between each of the letters (totally unhighlighted). I t i s l i k e t h i s, with all the letters highlighted as I would expect, but with those darned unhighlighted spaces inbetween......... I can't figure out what's causing it either.......... here is the code:

<%
artLength = Len(R1_new)

For i = 1 to artLength

If Mid(R1_old, i, 1) <> Mid(R1_new, i, 1) Then %>

<font style="color:black; background-color:pink"><%=Mid(R1_new,i,1)%></font>

<% else %>
<font style="color:black; background-color:cyan"><%=Mid(R1_new,i,1)%></font>

<%end if
Next%>

Any ideas what might be causing that?
 
Well it seems it has to do with the css highlighting.... but why the heck would that be?.....
 
It is probably the "white space" you are inserting in between the highlighted substrings. In this case it is due to your indented HTML, the newlines at the end, or both.

Try viewing the source at the web browser to see what exactly you are emitting.

My guess is you really want something more like:
Code:
<%    
  artLength = Len(R1_new)    

  For i = 1 to artLength
    If Mid(R1_old, i, 1) <> Mid(R1_new, i, 1) Then
      Response.Write _
        "<font style=""color:black; background-color:pink"">"
    Else
      Response.Write _
        "<font style=""color:black; background-color:cyan"">"
    End If
    Response.Write Mid(R1_new,i,1) & "</font>"
  Next
%>
But maybe I'm missing something else here.

Personally I'd also ditch the <font> tags and replace them with <span> tags instead. Give one [tt]class=hlSame[/tt] and the other [tt]class=hlDiff[/tt] and define the styles for these classes within a <style> block in the page's <head> block, or inside whatever other style block or external stylesheet your page uses.
 
Wow, very helpful post.... getting me all fixed up. I never even thought to view the source from the browser, that can be pretty helpful for understanding.

Thanks!
 
Okay, I've gotten along alright and now I'm to the next really big step.... but thinking it through I can't come up with a logical way to achieve it.

The document is split up into sentences... the problem is that if you add a whole new sentence (as you can see with "oobie doobie" in my example) it throws off everything else that follows.... saying that everything after "oobie doobie" is also modified, when really it's just pushed back some.

Here is my code... sorry it's a bit sloppy:

Code:
R1_new = "Hello, my name is jimbo.  The world will end NOW.  You smells.  Ooobie doobie.  Die evil doers."

R1_old = "Hello, my name is jimbo.  The world will end soon.  You stink.  Die evil doers."
		
artNew = Split(R1_new, ".")
	
artOld = Split(R1_old, ".")
			

        For i = 0 to UBound(artNew) - 1
          tempNew = artNew(i)
          tempOld = artOld(i)
          If StrComp(tempNew,tempOld,0) <> 0 Then 
            Dim oldStr, newStr
            if Len(tempNew) = Len(tempOld) Then ' If the old and new are the same length
             for j = 1 to Len(tempOld)
               If StrComp(Mid(tempOld,j,1),Mid(tempNew,j,1),0) <> 0 Then
               Response.Write _
		"<span style='color:black; background-color:pink'>" & Mid(tempNew,j,1) & "</span>"

			   else
			   Response.Write _
		"<span style='color:black; background-color:lightgreen'>" & Mid(tempNew,j,1) & "</span>"

			   End If
             Next
            elseif Len(tempNew) > Len(tempOld) Then ' If old is longer than new
             for j = 1 to Len(tempNew)
               If StrComp(Mid(tempOld,j,1),Mid(tempNew,j,1),0) <> 0 Then
               Response.Write _
		"<span style='color:black; background-color:pink'>" & Mid(tempNew,j,1) & "</span>"

			   else
			   Response.Write _
		"<span style='color:black; background-color:lightgreen'>" & Mid(tempNew,j,1) & "</span>"

				End If
             Next
             'temp = Left(tempNew,Len(tempNew) - (Len(tempNew) - Len(tempOld)))
             'Response.Write _
		'"<span style='color:black; background-color:pink'>" & temp & "</span>"

		
			elseif Len(tempNew) < Len(tempOld) Then ' If new is longer than old
             for j = 1 to Len(tempOld)
            ' for j = 1 to UBound(newStr)
               If StrComp(mid(tempOld,j,1),Mid(tempNew,j,1),0) <> 0 Then
               Response.Write _
		"<span style='color:black; background-color:pink'>" & Mid(tempNew,j,1) & "</span>"

			   else
			   Response.Write _
		"<span style='color:black; background-color:lightgreen'>" & Mid(tempNew,j,1) & "</span>"

				End If
             Next
          '   temp = Left(tempOld,Len(tempOld) - (Len(tempOld) - Len(tempNew)))
         '    Response.Write _
		'"<span style='color:black; background-color:pink'>" & temp & "</span>"

		    else ' Should be no other possibilities... error character if something gets through
		       Response.Write("*")
            end if
          else
            Response.Write _
		"<span style='color:black; background-color:cyan'>" & artNew(i) & "</span>"

          end if
          
          'Just added...
         Response.Write(". ")
         
          Response.Write _
		"<b />"
		
         Next


Any suggestions how I could skip past sentences that are inserted like that?
 
Try this...

Code:
' *******************************************************************
' Function to show "word" differences between two strings
'
Function highlight_difference(ByRef source, ByRef change, ByVal delim)
	On Error Goto 0
	Dim a1, a2
	a1 = Split(source, delim)
	a2 = Split(change, delim)
	
	Dim sb, se
	sb = "<span style=""background: gainsboro; color: orangered"">"
	se = "</span>"
	
	Dim a1ub, a2ub
	a1ub = UBound(a1)
	a2ub = UBound(a2)

	Dim r, i1, i2, j, b, f, x
	ReDim r(100)
	i1 = 0
	i2 = 0
	j = 0
	b = False
	f = 0

	Do
		'
		' check if current word is match
		'
		If a2(i2) = a1(i1) Then
			'
			' Turn off matching or process word
			'
			If b Then
				r(j) = se
				j = j + 1
				b = False
			Else
				r(j) = a2(i2)
				j = j + 1
			End If
		Else
			'
			' Turn on non-match and process word
			'
			If Not b Then
				r(j) = sb
				j = j + 1
				b = True
				r(j) = a2(i2)
				j = j + 1
			End If
		End If

		'
		' Verify output array size
		'
		If j - 1 > UBound(r) - 10 Then
			ReDim Preserve r(UBound(r) * 2)
		End If
		
		'
		' If in non-match, search forward to next match
		'
		If b = True And i1 < a1ub Then
			'
			' Search forward through change for next match to source
			'
			Dim m1, m2
			m1 = i1 + 1
			For m2 = i2 + 1 To a2ub
				If a2(m2) = a1(m1) Then
					'
					' Search backward through change for common word in source
					'
					Dim n1, n2, o1, o2, o3
					o1 = -1
					o2 = -1
					o3 = a1ub
					
					For n2 = m2 - 1 To i2 + 1 Step -1
						For n1 = m1 + 1 To o3
							If a2(n2) = a1(n1) Then
								o1 = n1
								o2 = n2
								o3 = n1
								
								Exit For
							End If
						Next
					Next
					
					If o1 > -1 Then m1 = o1
					If o2 > -1 Then m2 = o2
					
					'
					' process though match point
					'
					
					For i2 = i2 + 1 To m2 - 1
						r(j) = a2(i2)
						j = j + 1
					Next
					
					r(j) = se
					j = j + 1
					
					b = False
					
					r(j) = a2(m2)
					j = j + 1
					
					i2 = m2
					i1 = m1
					
					Exit For
				End If
			Next
		End If

		If i1 = a1ub Then
			Dim k
			For k = i2 + 1 To a2ub
				r(j) = a2(k)
				j = j + 1
			Next
			
			Exit Do
		ElseIf i2 = a2ub Then
			Exit Do
		End If

		i1 = i1 + 1
		If Not b Then i2 = i2 + 1
	Loop 
	
	If b Then
		r(j) = se
		j = j + 1
		b = False
	End IF

	j = j - 1
	ReDim Preserve r(j)
	
	Dim result
	result = Join(r, delim)
	
	Erase r
	
	'
	' clean result string from extra delims
	'
	
	Dim re
	Set re = New RegExp
	
	re.Global = True
	re.Pattern = "(\<[^\>]*\>)(\" & delim & "+)(.*?)(?!\" & delim & "\<\/)?(\" & delim & "+)(\<[^\>]+?\>)"
	result = re.Replace(result, "$1$3$5")
	re.Pattern = "(\<[^\>]+?\>)(\" & delim & "+)(\<[^\>]*\>)"
	result = re.Replace(result, "$2")
	
	Set re = Nothing	
	
	highlight_difference = result 
End Function
 
Hum... found a big. Change line "m1 = i1 + 1" to just "m1 = i1
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top