Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Intelligent row combine in Excel 1

Status
Not open for further replies.

VBAjedi

Programmer
Dec 12, 2002
1,197
KH
This is a messy data problem that I'm not sure can be solved with VBA. But I'll give the forum a crack at it anyway.

I need to create mailing labels from a spreadsheet (a .csv export from an online dbase I can't access) of names and addresses. The spreadsheet has one person per row, columns as follows:

FirstName | LastName | StreetAddress | City | State | ZIP

There's a few other columns in there that aren't relevant and I can easily adjust any code you give me to match the actual spreadsheet.

The problem is that the data contains cases where two to six family members live at the same address, and I want one combined mailing label for them. So I need to combine those six rows into one for that address. Now, I know how to write a code loop to iterate through the rows from bottom to top, combining rows based on an exact match in a specific column.

The complicating factor is that this data is entered by those individuals, and the street address field is often not an exact match for those individuals at that address ("Street" versus "St" versus "St.").

So can anybody think of a way to do an intelligent comparison of street addresses? I thought if the data was sorted by last name, then by street address, I could get away with using Left() function to just look for matches in the first five characters, but some unrelated addresses (like PO Boxes) match in the first five characters...

I even thought of pushing the address out to the web to be geocoded, but that seems like an excessively bulky workaround... (right?)

Scratching my head...

VBAjedi [swords]
 


Hi,

How will you "combine" 6 different names into one? Please state the LOGIC RULES you intend to use.

I'd be using MS Query.

I would clean up the data, step by step, using progressive queries like

Where StreetAddress Like '%street%'

You look at the results and then CHANGE the data using FIND > REPLACE. It must be done in an orderly way, in order to effect the FINAL resluts in a series of refinements. THIS CAN BE A SLOW TEDIOUS PROCESS, because it is a manual one. Lots of ANALYSIS of the data and a logical series of steps to get from point A to point Z.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Good to see you're still here Skip! I haven't been in the forum much in the last several years.

Ugh. Geocoding sounds easier than normalizing the data! Reading through a sample of this data, it's amazing how many ways people screw up their address. :)

I'm now considering parsing the address string to find the first contiguous number string (which should be the street/PO box number) and matching on that combined with last name. The odds of someone with the same last name living at the same house number on a different street are small enough to be acceptable to me. :) What do you think?

The actual logic I'm using is pretty simplistic (has a few flaws but works):

1) Sort data by street address and last name columns.
2) Starting at bottom of data and working up:
a) See if address and last name in current row match the row above. If so, append first name from the current row to the first name of the row above (separated by a comma and space), then delete current row.
3) Move to row above and repeat. If the row above THAT also has the same last name and address, the result will be THREE first names in the first name column. And so on indefinitely for as many people at that address with the same last name.

One flaw of course (aside from my difficulties in matching the address) is that I have no way to list the names in the culturally expected order (head of household, spouse, child 1, child 2, etc). They just get listed in whatever order the spreadsheet rows happen to be in. But there's no way around that without having family relationship data in the .csv export, so I can live with that.


VBAjedi [swords]
 

Yes, I see you have virtually been AWOL since Jan 2005. ;-)

Good approch.

Where do you need help?

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Was in Sub-saharan Africa for a couple of years, then came back to a job that didn't involve much VBA. I miss it! :)

The main thing I need help with is finding the best simple but high-percentage way to match street addresses that aren't entered exactly the same.

VBAjedi [swords]
 


Short of "correcting" the data, removing double spaces, standardizing, comparing using CAPS, etc. I would not have experience with anything else. I've done those kind of things in the past, but only on rare occasions and not repeatedly.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Yeah. I hate data that is inconsistently entered...

Well, have a star for taking the time to check my logic and matching approach!

:)



VBAjedi [swords]
 


Thanks.

Stay in touch and please post a resolution to this thread, whenever it becomes appropriate.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
I even thought of pushing the address out to the web to be geocoded, but that seems like an excessively bulky workaround... (right?)
Actually, that sounds like the only sound approach (or having an equally robust local database)

_________________
Bob Rashkin
 


I agre with Bob. Let us know what process and results if you pursue that option.

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top