Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations dencom on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Transferring a Value that is contained in a field for 250,000 records 1

Status
Not open for further replies.

amerifax

Technical User
Dec 28, 2006
37
It's hard to describe what I need to do but I put together a text file that I am attaching to try to help explain. But bottom line I have a field called PropertyDescription_1 where there is a high probability of a Subdivision Name contained within if certain conditions are met.

I hope I'm not asking for the moon but I guess when you think about it...we've already been there. lol

Bob
 
Can you provide the actual contents of a single property description field. I am having a hard time understanding what is a field value, what is your note, and what is the next value. I have the general idea, but I prefer it without the notes. Need to understand the data.
 
I'm attaching a accdb that has added field's to show what the results should be. They all start with the name value.

I also might mention that records 12,13,14 seem so close. I will be running the results against my master file that will pull the three records so they show "Harmony Groved". So as long as I have a fair piece of it I can get a match based on Sub and another field, not shown called SC.

Bob
 
 http://www.home2000.com/Example_of_Subdivision.accdb
Is there a reason that the PropertyDescription field is truncated? Is your data really like that, or is this a bad example?
 
This is about as good as you are going to get. I say it is about 85% accurate and it would be pretty easy to fix from there. I did not do the final steps I leave that to you, but it parses the data pretty well. There is no real rules so any solution needs to make it easy for the user to verify. Also this works for this given data set, but there is so many exceptions it will have to get tweaked for a bigger data set. if Possible post something with a couple thousand record. These kind of problems have no real solution, I would focus on additional features to make verifying easier.

Try the demo.

Here is some code to see the thought process. Bottom line the code is the code is extremely modular so each function can be refined.

Code:
Option Compare Database
Option Explicit
Public Function getLot(aDesc() As String) As String
  Dim I As Integer
  For I = LBound(aDesc) To UBound(aDesc)
    If InStr(1, Trim(aDesc(I)), "Lot") > 0 Then
      getLot = Trim(aDesc(I))
    End If
  Next I
  If getLot = "" Then getLot = "N/A"
End Function
Public Function getSubDivision(aDesc() As String) As String
  Dim I As Integer
  For I = LBound(aDesc) To UBound(aDesc)
    If InStr(1, Trim(aDesc(I)), "SubDivision") > 0 Then
      getSubDivision = Trim(aDesc(I))
    End If
  Next I
  
  If getSubDivision = "" Then
  'pick the field without block or lot
     For I = LBound(aDesc) To UBound(aDesc)
       If Not InStr(1, Trim(aDesc(I)), "Lot") > 0 And Not InStr(1, Trim(aDesc(I)), "Block") > 0 And Not InStr(1, Trim(aDesc(I)), "Condo") > 0 Then
         getSubDivision = Trim(aDesc(I))
       End If
      Next I
  End If
  
  If getSubDivision = "" Then
    getSubDivision = "N/A"
  Else
    getSubDivision = cleanSubDivision(getSubDivision)
  End If
End Function
Public Function getBlock(aDesc() As String) As String
  Dim I As Integer
  For I = LBound(aDesc) To UBound(aDesc)
    If InStr(1, Trim(aDesc(I)), "Block") > 0 And Not Trim(aDesc(I)) = "Block" Then
      getBlock = Trim(aDesc(I))
      Exit Function
    End If
  Next I
  If getBlock = "" Then getBlock = "N/A"
  
End Function
Public Function getCondo(aDesc() As String) As String
  Dim I As Integer
  For I = LBound(aDesc) To UBound(aDesc)
    If InStr(1, Trim(aDesc(I)), "Condo") > 0 Then
      getCondo = "Condominium"
    End If
  Next I
  If getCondo = "" Then getCondo = "N/A"
End Function
Public Function getDescriptionArray(strDescription As String) As String()
  getDescriptionArray = Split(strDescription, ",")
End Function
Public Function cleanSubDivision(strSubDivision) As String
  Dim cleanAfter As Integer
  cleanSubDivision = Trim(Replace(strSubDivision, "Subdivision", ""))
  
  'Find Addition
  cleanAfter = InStr(cleanSubDivision, "ADDITION TO")
  If cleanAfter > 0 Then
    Debug.Print cleanSubDivision
    cleanSubDivision = Trim(Mid(cleanSubDivision, cleanAfter + 11))
    Debug.Print cleanSubDivision
  End If
  
  'Find ADDN
  cleanAfter = InStr(cleanSubDivision, "ADDN")
  If cleanAfter > 0 Then
    Debug.Print cleanSubDivision
    cleanSubDivision = Trim(Mid(cleanSubDivision, cleanAfter + 4))
    Debug.Print cleanSubDivision
  End If
  
  'Find Continuation OF
  cleanAfter = InStr(cleanSubDivision, "Continuation Of")
  If cleanAfter > 0 Then
    Debug.Print cleanSubDivision
    cleanSubDivision = Trim(Mid(cleanSubDivision, cleanAfter + 15))
    Debug.Print cleanSubDivision
  End If
  
  'Find Condominium at end
  cleanAfter = InStr(cleanSubDivision, "Condominium")
  If cleanAfter > 0 Then
    Debug.Print cleanSubDivision
    cleanSubDivision = Trim(Mid(cleanSubDivision, 1, cleanAfter - 1))
    Debug.Print cleanSubDivision
  End If
  
  'Find CONDO at end
  cleanAfter = InStr(cleanSubDivision, "Condo")
  If cleanAfter > 0 Then
    Debug.Print cleanSubDivision
    cleanSubDivision = Trim(Mid(cleanSubDivision, 1, cleanAfter - 1))
    Debug.Print cleanSubDivision
  End If
  
End Function
Public Sub CreateSubDivisionFields()
  Const tblSubDivision = "Subdivision"
  Const fldDescription = "PropertyDescription_1"
  Const fldLot = "Lot"
  Const fldBlock = "Block"
  Const fldSubDivision = "Sub"
  Const fldCondo = "Condo"
  Const fldID = "ID"
  
  Dim strLot As String
  Dim strSubDivision As String
  Dim strCondo As String
  Dim strBlock As String
  Dim rs As DAO.Recordset
  Dim strDesc As String
  Dim aDesc() As String
  Set rs = CurrentDb.OpenRecordset(tblSubDivision, dbOpenDynaset)
  
  Do While Not rs.EOF
    If Not IsNull(rs.Fields(fldDescription)) Then
     strDesc = rs.Fields(fldDescription)
     strDesc = cleanDesc(strDesc)
     aDesc = getDescriptionArray(strDesc)
      strLot = getLot(aDesc)
      strBlock = getBlock(aDesc)
      strCondo = getCondo(aDesc)
      strSubDivision = getSubDivision(aDesc)
      rs.Edit
        rs.Fields(fldLot) = strLot
        rs.Fields(fldBlock) = strBlock
        rs.Fields(fldCondo) = strCondo
        rs.Fields(fldSubDivision) = strSubDivision
      rs.Update
    End If
    rs.MoveNext
    
  Loop
  
End Sub


Public Function cleanDesc(strDesc) As String
  'remove comma from end
  cleanDesc = Trim(strDesc)
  If Right(cleanDesc, 1) = "," Then
    cleanDesc = Trim(Left(cleanDesc, Len(cleanDesc) - 1))
    Debug.Print cleanDesc
  End If
End Function
 
Wow! That is amazing. I will sit down and look it all over. I can't thank you enough for this help. I know we don't live in a perfect world and the field I am dealing with here is...well you saw it. That is the way the data actually looks. And it probably gets worse than that. I just sampled a small amount of records to try and get all the different criteria covered that I am looking for.

I will download the file and check it out and try to give it a go. I am sure I will have more questions...so until then.

Thanks soooo much for your help!

Bob
 
Again, I want to thank you so much for all your help. I have just a few more things and I hope I am not pushing it, you have done so much to help me already. One thing for the PropertyDescription_1 Field, if it contains CSM, Section, Township or Parcel, we don't want to transfer any data to the Sub field. Where we are able to make a transfer we would like to have a field called "Sub-Ck" and give it a 'X' to signal that we made a successful transfer. Where PropertyDescrption_1 contains 'SEE ATTACHED' we would like to give 'Sub-Ck' an 'A' and the last criteria would be if PropertyDescription_1 contains Sub(space) or Sub. transfer the full line the the 'Sub" field and give 'Sub-Ck' a 'M'. Where we are using Lot for a criteria, Lt would also be short for Lot. In the Lot and Block fields that you are creating, is it possible not to include the actual words Lot and Block?

And I have one new one. I am going to be attaching a sample of the actual data we are using. There are 4 fields in this data that we need to combine into 1 field so we can make a relation to our other data files. The fields are CityYN, VillageYN, TownYN, TVCname. In the city, village and town fields they either contain a 1 or 2. If the field contains a 1 that is what the municipality is. The TVCname field is the name of the City, Village or Town. We would like to combine these fields into a field called Typecity with a value of CITY OF APPLETON, VILLAGE OF SHERWOOD, TOWN OF HARRISON etc. In some cases TVCname appears as SHEBOYGAN, CITY OF. It isn't consistent though. Some have the ", *** Of" and others don't. Hopefully the sample I send you will show you enough. Or maybe you can suggest the best way to handle it.

The help you have given us is going to be a tremendous help and make it a little easier to process all these data files we are working on. So we are going to be trying to fit this module into many different data files. After you created this, we sit down and study each and every piece to see how each part is looking at the data and pulling what it needs and separating it into the necessary fields. So as we go along I may have a few more questions for you, I hope you don't mind.

I am attaching the data, let me know if you have any questions for me.

Bob
 
 http://www.home2000.com/2006retr_sample.accdb
I got your database, I recommend you remove it from that link. There appears to be confidential data in it.
 
At this point, I will not do it in code but do update queries. The remaining, and additional requirements can be done by simple update queries. I will put a button for each query so you can see it run.
 
Can you explain the "dirty" data? For example these fields all likely are supposed to say "See Attached" or "See Addendum".
I cannot imagine this many typos, but it makes searching for things basically impossible.

Sub
SE5 AT7ACHED
SEB ATTACHBD
SEB ATTACHI3D
SEB ATTACIIED
SECOND ADD. TO JOHN HOPPE
SECTION 13
SEE A MACIMMVR
SEE ADDEND W A
SEE AIJACHED
SEE AITACHED LEGAL DESCRIMICJN
SEE AMACFGD
SEE AMACHM M
SEE ARTACL#D
SEE ATACHED
SEE ATIIACHED
SEE ATJ77CHED
SEE ATT14CHED
SEE ATTA -
SEE ATTAC.HED
SEE ATTACFNVMNT
SEE ATTACH MENT
SEE ATTACH MENT
SEE ATTACH MENT
SE'E ATTACH&D
SEE ATTACHBD
SEE ATTACHCD
SEE ATTACHEO
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMENT
SEE ATTACHMNT
SEE ATTACHTD
SEE ATTACIIMENT
SEE ATTACILED
SEE ATTACIM
SEE ATTACIM1ENT
SEE ATTACIM1ENT.
SEE ATTACIMENT
SEE ATTACIMM4NT
SEE ATTHCH6D
SEE ATTHCHE17
SEE ATTR4CHED
SEE LEGAL DESMI MICJN AMAQ4ED HME1O
SEF ARTACHED
SFF. A MAC MD
SI3I3 ATTACHIGD.
SI3IG ATTAC -
SIBI3 ATTACHI3D
SIGB ATTACTMMNT


You could use things like SOUNDEX to find like sounding words, but only the human eye or advance artificial intelligence can determine that
"SE5 AT7ACHED" is really "See Attached"
 
Se At" would be close ok. Where the value has <20 characters with a "space" between the 2 values, I can't see that getting much better.

I got the data from state of Wisconsin ,many of the files got ocr , for older files. With the flag Field-sub-ck and a few other fields we will be able to complete some fields by a semi auto approach. one of the most important is field - "SC". That breaks it down to the city, village or town. Know if we do a contains on whole words in sub field against sub in our subdivision data base we have almost a match. Because based on the town we might only have 50 subdivisions. And if this large data based locks in on town of Brookfield containing the word Stonehenge I most likely will have a match. The "SC" is the most perfect field in the data base.

Bob
 
I was just thinking since they all are most likely "See Attachment" would it be a good idea to treat them as that. The "sub-CK" "A" would only be used to exclude the record when doing select searches and fixing of records.

Bob
 
Would it be a better idea to search on those similar values and replace them all with the proper words to keep things a little more simple. The older data we received from the state was all scanned in so I'm sure they just ran an OCR against it so that's why there are so many misspellings and problems in that field. As the data gets newer it gets better. But for now, this is what we have to deal with unfortunately.

I just wanted to tell you again how much I appreciate your help with this.

Bob
 
Bob,
The original code was based on the original data set. In that set most of the data was relatively clean and the information was seperated by commas. So it may look something like
Subdivision information, block information, lot information, etc.
The strategy was to split these blocks and read them. if they contained block information then stick that in the block field, lot info stick in the lot field, etc.

This strategy is still good, but only part of a bigger strategy. Your data is about as raw as it gets. It looks like it was scanned with lots of error. So if this was me I would apply lots of different approaches to narrow the data down and continously clean it. There is no way to do this all at once. But lots of update queries and continue refinement may take what would be 100s of hours into a few hours.

To make this easier to work with I would create a new table that links to your primary table. This can be done with a make table query.

1) I would make a new table with

tblCleanProperties
saleNumber (foreign key)
descriptionOld (property description field)

Then I would add these fields
descriptionClean (clean version of the description)
blnLot (These are all yes no boolean fields)
blnBlock
blnSubdivision
blnCSM
blnTownShip
blnParcel
blnCondo
blnSection
blnAttachment
strLot (these are all the text fields containing that word)
strBlock
strSubdivision
strCondo
strAttachment
Sub-CK
valLot
valBlock

This way I can do lots of work on this table without messing with the main table. I then can then relate back to the main table by the primary key (which appears to be saleNumber)

1) Do lots of update queries to clean up the old description and put in descriptionClean. This is preprocessing. Doing it this way I will eventually work with the clean data, but have the old data available. Here are some examples.

Update the attachement field
Code:
UPDATE qrySubDivision SET qrySubDivision.descriptionNew = "SEE ATTACHED", qrySubDivision.blnAttachment = True
WHERE (((qrySubDivision.descriptionOld) Like "*Atta*"));

This puts "SEE ATTACHED" in the new description field and puts a check mark in the yes no "blnAttachment" field
Now it will be a lot easier later to look for attachments

Code:
descriptionOld	               descriptionNew
LIGGAL DESCRIPTION ATTACHED    SEE ATTACHED
SBB ATTACHBD                   SEE ATTACHED
SDEE LEGAL ATTALHED            SEE ATTACHED
SEB ATTACHBD                   SEE ATTACHED

So now go back and do some looking at the data and see what you missed. I still see some that have "Addendum", and I assume logically get treated the same way
So
descriptionOld
SEE ADDENDUM A
SEE ADDEND W A
SEE ADDENDUM
ADDENDLLWL

Code:
UPDATE qrySubDivision SET qrySubDivision.descriptionNew = "SEE ATTACHED", qrySubDivision.blnAttachment = True
WHERE (((qrySubDivision.descriptionOld) Like "*ADDEN*"));

So look again trying "*See A"
and still
Code:
descriptionOld
SEE ATACHED
SEE ATTHCHE17
SEE ATTR4CHED
SEE AMACFGD
SEE AMACHM M
SEE ARTACL#D
SEE ATJ77CHED
SEE ATT14CHED
SEE AIJACHED
SEE ATTHCH6D
SEE AITACHED LEGAL DESCRIMICJN
SEE ATIIACHED
SEE A MACIMMVR
run another update

Get the idea? Search, Update.

Eventually I will have much cleaner description field to work with. Plus a boolean field to quickly query. But this may require a couple of hours, but it will be worth it in the long run.
Some may not be able to get clean so then just dump the old description field into the cleanDescription field (where the clean description field is Null)

Then Start identifying if it contains block, lot, subdivision etc.
Now you will have a boolean field that can be used to determine if you need to worry about it.

Code:
UPDATE tblCleanProperties SET tblCleanProperties.blnSubDivision = True
WHERE (((tblCleanProperties.descriptionOld) Like "*SUBDI*"));
Code:
UPDATE tblCleanProperties SET tblCleanProperties.blnBlock = True
WHERE (((tblCleanProperties.descriptionOld) Like "* block *" Or (tblCleanProperties.descriptionOld) Like "* blk *"));

after doing all that you can run the code. Has to be modified a little now, but basic idea. Now however if there is a yes in subdivision and an NA in the text field you can query and see the problem.
Once done you can use the boolean fields to do update queries on the sub division field. (if blnCSM = true then strSubdivision becomes N/A)

Then you can do a series of update queries to get
strBlock: "Block (14)"
to
valBlock: "14"
You will see that will take a lot of work as well.
 
Once again...WOW!! I will sit down and give this a try. You make it seem so simple. You have no idea how much you have helped me. I am going to give this a try. So I should create a table with the fields you specified and then run each update query against the data to clean it up. I just change the bln field I want to put the check in and the criteria I am searching for right? I think I can handle it!?

Bob
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top