Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Highlighting differences... yes, I'm still on this 2

Status
Not open for further replies.

hotmailisforloosers

Programmer
Jun 22, 2004
25
0
0
US
Okay, well I thought I was going to be taken off this task, but I never was...... so I need to fix this process of highlighting the differences between two articles (that line up side by side), modified and the old version.

I went back and basically started over with a different perspective. Now I'm splitting it up by sentences, and just highlighting the entire sentence if there is a difference. I'm actually highlighting where there aren't differences in another color too, for testing purposes.

Where I run into problems is where a sentence is totally deleted, or a whole new one is added..... that throws off the comparison order and causes one insertion or deletion to make the entire rest of the page highlight as changed.

I have been able to get around that some... the code I have now seems to find additions like that, but it totally skips over them... flat out removes them from the text, rather than highlighting a different color. So I think I've made a big step in isolating the differences.... but I can't for the life of me figure out how to keep it from skipping over it like that (or inserting a whole new duplicate string for each sentence).

So I'll paste my code so far here, and maybe an experienced programmer will have a suggestion. Here:

Code:
<%
    ' Trim off spaces from beginning and end while setting the strings... since they do seem to
    '   leave a lot of spaces around the article.                                    
	R1_old = Trim(Recordset1.Fields.Item("CHarticle").Value) ' Article before modification
	R1_new = Trim(Recordset1.Fields.Item("CHarticle2").Value) ' Article after modification

	' Disable HTML so that the HTML code doesn't interfere with highlighting, and so that
	'   admin can see code changes as well........... kills the layout of a page though,
	'   and is not as easy to look at... you could also remove all tags, or specific tags.
	R1_old = Replace(R1_old, "<", "&lt;")
	R1_new = Replace(R1_new, "<", "&lt;")

	

	' Splitting the articles up into sentences.  artNew and Old become arrays of strings
	artNew = Split(R1_new, ".")
	artOld = Split(R1_old, ".")
	
        For i = 0 to (UBound(artNew) - 1)

          if i > UBound(artNew) then
			tempNew = "xxxx" 'Inserts a string to compare to (will show there is a 
							 '  difference), and keep from running off the array
          else
			tempNew = artNew(i)
          end if
          
          if i > UBound(artOld) then
			tempOld = "xxxx" 'Inserts a string to compare to (will show there is a 
							 '  difference), and keep from running off the array
          else
			tempOld = artOld(i)
          end if

		i2 = i
		if i <= UBound(artOld) and i <= UBound(artNew) then

			if i <= UBound(artOld) and i2 <= UBound(artNew) and artNew(i2) <> artOld(i) then
				do while i <= UBound(artOld) and i2 <= UBound(artNew) and artNew(i2) <> artOld(i)
					i2 = i2 + 1
					if i >= UBound(artOld) or i2 >= UBound(artNew) then
						exit do
					end if
				loop
			end if
			
			if i <= UBound(artOld) and i2 <= UBound(artNew) and artNew(i2) <> artOld(i) then
				do while i < UBound(artOld) and artNew(i2) <> artOld(i)
					i2 = i2 - 1
					if i <= LBound(artOld) and i2 <= LBound(artNew) then
						exit do
					end if
				loop
			end if
			
			if i <= UBound(artOld) and i2 <= UBound(artNew) and artNew(i2) = artOld(i) then
				response.Write _ 
				"<span style='color:black; background-color:skyblue'>" & artNew(i2) & "</span>"			
			elseif artNew(i2) <> artOld(i) then		
				response.Write _ 
				"<span style='color:black; background-color:gold'>" & artNew(i) & "</span>"
			else
				response.Write _ 
				"<span style='color:black; background-color:red'>" & artNew(i) & "</span>"
			end if

        else
        i = i
			Response.Write _
		"<span style='color:black; background-color:cyan'>" & artNew(i) & "</span>"
        end if
        
        'Put the periods back in.
        Response.Write(".")	
	
        Next
%>
 
From the 10,000 foot view, what I think you want to do is this. Read each file one line at a time until you come to a line where the left file does not match the right file. Then put the left file line that didn't match into a temp var and do the same for the right file line that didn't match. Now continue reading each line from each file. Now for each line, you need to make three comparisons:
1) Current left file line to current right file line
2) Current left file line to the temp right file line
3) Current right file line to the temp left file line

One of four things will will come from these comparisons:
1) Match one will happen. This means that differing amounts were deleted from each file. Highlight from the temp line in each file to this line in each file.
2) Match two will happen. This means that lines were deleted from the right file. Highlight the right file up to this point.
3) Match three will happen. This means that lines were deleted from the left file. Highlight the left file up to this point.
4) No matches occure. Keep comparing.

I did all of this in my head, so there are probably several logic flaws, but if this were my problem, this is how I would approach it.

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
I have written Difference utilities and the first thing I would say is: its a losers game. There are enough Differences programs out there. Most are line oriented so I suggest that if your requirements are sentence-oriented then transform your text into one sentence per line line and call one of:

- Windows Cmd FC
- WinDiff (Microsoft)
- CSDiff (
OK, so you want to be a hairy-chested programmer and do it yourself ? Resync is the problem. The worst case is a line/sentence that has Moved. I know of two strategies:

1. Look-ahead. At the point something doesnt match, you look for a match with the current left-side line by reading ahead on the right. Then vice-versa. But you might have look backwards too. You may or may not limit how far ahead you will look.

OR
2. Do a first pass thats sorts the lines(sentences), so you can do an ordered merge to get all the matching ones (you have two inputs, always advance the lesser one). Then you simply report the ones that dont match (in original order); and decide whether you want to worry about matched sentences in different order.
 
Thanks Clay. I'll probably keep working on it based on what you guys have told me, but I'm going to tell the bosses that this is out of my league..... I'm definitely NOT a hairy-chested programmer... my chest is bald... intern bald :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top