Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problems parsing addresses

Status
Not open for further replies.

ehicks727

Programmer
Oct 26, 2004
36
US
Hi, I'm having a problem thinking through this logically and could sure use some help.

I've got addresses, but the address elements are variable, meaning I could have any or all of the following address elements

Business name
Address 1
Address 2
City/St/Zip
County

The problem is that I don't necessarily have all those elements every time. Sometimes it's Biz name, Addr1, csz, county. Sometimes it's Addr1, Addr2, csz. Some details are that there is ALWAYS at least an addr1 and csz, and addr1, addr2, and csz are always in that order. Other than that, it's anything goes.

I've written an if...else if statement that seemed to work, but then I ran into situations that crashed the algorithm. I explain the situations below the code.

I'm just having problems getting my brain around this problem and could use some help please.

Code:
String test = "Lake Mary Primary Care, LLC<br />4106 W. Lake Mary Boulevard<br />Suite 100<br />Lake Mary, FL&nbsp;&nbsp;32746<br />Seminole County<br />";

// other possibilities
// String test = "Lake Mary Primary Care, LLC<br />4106 W. Lake Mary Boulevard<br />Lake Mary, FL&nbsp;&nbsp;32746<br />Seminole County<br />";
// String test = "Lake Mary Primary Care, LLC<br />4106 W. Lake Mary Boulevard<br />#100<br />Lake Mary, FL&nbsp;&nbsp;32746<br />Seminole County<br />";
// String test = "4106 W. Lake Mary Boulevard<br />Suite 100<br />Lake Mary, FL&nbsp;&nbsp;32746<br />Seminole County<br />";


String [] tA = test.split("<br />");

String addrcsz = "", addr = "", addr2 = "", csz = "", city = "", st = "", zipcode = "";

// flag indicating address1 already detected, and csz already detected
boolean a = false, c = false;


// if it starts with a number, then it is most likely addr1
if (isNumeric(tA[x].substring(0, 1))) {
	System.out.println("This is an address");
	// flag addr1 found
	a = true;
	// assign to addr
	addr = tA[x];

// if last 5 digits are numbers, then it's most likely a csz
} else if (isNumeric(tA[x].substring(tA[x].length() - 5, tA[x].length()))) {
	System.out.println("This is a csz");
	// flag csz found
	c = true;

	// split up the csz into city, state, and zip
	csz = tA[x].replaceAll("&nbsp;&nbsp;", " ");
	if (csz.substring(csz.length() - 5, csz.length() - 4).equalsIgnoreCase("-")) {
		zipcode = csz.substring(csz.length() - 10, csz.length());
		city = csz.substring(0, csz.indexOf(","));
		st = csz.substring(csz.length() - 13, csz.length() - 11);
	} else {
		zipcode = csz.substring(csz.length() - 5, csz.length());
		city = csz.substring(0, csz.indexOf(","));
		st = csz.substring(csz.length() - 8, csz.length() - 6);
	}

// if element is between addr1 and csz, then it is addr2
} else if (a && !c) {
	System.out.println("This is address2");
	// assign to addr2
	addr2 = tA[x];

} else {
	System.out.println("Not address or csz");
}





static public boolean isNumeric(String string) {
     	return string.matches("^[-+]?\\d+(\\.\\d+)?$");
}


One problem occurs while I'm testing for numeric. For instance, if I'm given a "#100" as the addr2, then when I go to test for the zipcode (last 5 digits numeric), then it crashes because "#100" is not 5 digits.

So I guess I just don't know how to pre-validate and test for things like spaces and numeric values BEFORE I run it through this if... else if algorithm. Does that make sense?? I'm sure there is probably a better solution, so I'm willing to think outside the box and entertain a whole different approach.

Any help is GREATLY appreciated. Thank you.

 
Could you evaluate the number of lines? That is, if there are 2 lines, then it's Addr1 and csz; if there are 3 lines, look for slashes (unique to csz?) in the second line, etc.

_________________
Bob Rashkin
 
Your 'algorithm' is really a heuristic, so don't be disappointed if you don't get a 100% strike rate. There is always a trade-off between false positives and false negatives.

For example, after you split the string, if there are five items, you know what they are. Likewise, if there are two, you know they must be address1 and csz. For other counts, check the penultimate line with a regex (using a Matcher lookingAt method) to check for trailing zip code. If it's found, then you know what the last two lines are. Check the last line in the same way. Use a similar scheme to check the first and second lines for numbers (although this isn't necessarily reliable. Regex is the way to go - they look for patterns, not literals.

If you fall through all your code without arriving at a concrete answer, print out the lines in question, then revisit your code.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top