Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Scanning Text for Mailing Addresses 2

Status
Not open for further replies.

jihanemo

Technical User
Aug 20, 2007
7
US
I'm in need of a program that does the following: scans documents or webpages for mailing addresses and copies the addresses into a database of some sort (like a MS Access or just MS Word).

I'm contacting all the daycare centers in my town and I have access to hundreds of daycare centers in the area. I need to copy and paste the mailing address for each center into my label-maker program. It's not a select-all and copy job because there is content in between each mailing address which would need to be deleted. I need some kind of text-scanning program that will scan the web page for mailing addresses and copy them into an MS Office program or something similar.

Anyone have an idea of where to find this? And if there exists such a program, what is it called? A text mining program?
 
Hi jihanemo,

If you have a document with email addresses in Word's hyperlink format (not plain text), you can use a wildcard Find/Replace (ie make sure the 'use wildcards' option is checked), with the following as the Find string:
<[^1-^31\!-\?A-ÿ]{1,}[!\@][^1-\?A-ÿ]{1,}
and nothing as the Replace string.

Afterwards, you should have a file with only email addresses in it (and maybe the odd @ and the following word if you have any loose @ characters in the document).



Cheers
[MS MVP - Word]
 
I think the OP wants mailing addrsses, not email addresses
 
^ Right! I need mailing addresses. Anyway ideas?

I have a strong feeling that I'll need to pay a programmer to create a program for me...
 

Hi,

Where are you getting these addresses?

What is the format or are there different formats?

Please answer BOTH questions.

Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Hi jihanemo & Skip,

OK, put it down to a mis-read on my part.

jihanemo: Somehow, I suspect you're not going to find a reliable software-based solution for retrieving mailing addresses from documents & webpages, especially if they're in different formats.

Skip: as per jihanemo's first post, the addresses are to be sourced from documents & webpages, which answers your first question and implies the probability of multiple formats, which might answer your second question.


Cheers
[MS MVP - Word]
 



Well here's a suggestion.

The City St Zip line, probably has your STATE 2-character code and the SCF portion of your ZIP, first 3, all the same or no more than 4.

So if it were the DFW area of Texas, I'd be looking for
[tt]
TX 761
TX 760
TX 750
[/tt]


Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Hi mintjulep,

Interesting. I note that the input files must contain known state abbreviations for extraction. That suggests the software may only work for one country (probably the US).

Unfortunately, the version in the link looks like crippleware:
· Only the first contact is saved.
· Nag Screen

The accompanying screenshots on the softpedia site show that the software doesn't:
. work with Word documents or web sites - you have to save the documents or relevant web pages to .txt or html files first;
. use any algorithm to determine what an address is - it simply requires the user to tell it whether to extract just the line containing the state abbreviation or that line plus a range of lines above/below. So, to be sure of getting complete addresses, you're liable to end up with an output file containing lots of irrelevant text.

Curiously, the softpedia site also claims the product has been rated as 'fair' by 21 users, but there are 0 user reviews ...



Cheers
[MS MVP - Word]
 
Thank you for your replies everyone. The shareware from softpedia looks like it may be worth a shot. I'll try it.
 
Hi jihanemo,

would you care giving us a handful of links to these websites so we can try and figure out a suitable pattern?

Cheers,
MiS

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
In that case, it might be a little easier than actually identifying the mailing address ...

This example is written for Excel, and assumes that you have added a project reference to the Microsoft Internet Controls. It also assumes that you have made a local copy of one of the pages that you are interested in (which obviously would not be the way to do it in reality):
Code:
[blue]Public Sub example()
    Dim myBrowser As InternetExplorer
    
    Dim TableCellCollection As Object
    Dim TableCell As Object
    
    Dim myRow As Long
    
    myRow = 1
    
    Set myBrowser = New InternetExplorer
    
    myBrowser.Navigate2 "c:\localcopyofpage.html"
    
    Do Until myBrowser.statusText = "Done"
        DoEvents
    Loop
    
    Set TableCellCollection = myBrowser.document.body.getElementsByTagName("TD")
    
    For Each TableCell In TableCellCollection
        If TableCell.className = "facility" And TableCell.PreviousSibling Is Nothing Then
            ActiveSheet.Range("A" & myRow) = TableCell.innerText
            myRow = myRow + 1
        End If
    Next
    Range("A1").EntireColumn.WrapText = False
    Range("A1").EntireColumn.AutoFit
End Sub[/blue]

 
How do I use this code? I'm not familiar with scripts, macro, or any kind of programming. What kind of code is it and how do I use it?
 
Excellent, strongm.
That is quite exactly the thing I was thinking of.
[thumbsup]


[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 


strongm,

Very nice! ==> *

Don't have a present need,l but that nugget's squirreled away.

Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top