Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular expressions 1

Status
Not open for further replies.

ozmex

IS-IT--Management
Nov 26, 2004
5
MX
having trouble with a regular expression. i have an ugly text file to parse that includes the following line

5..... LASTNAME LASTNAME, FIRSTNAME MIDDLENAME TITLE 32 R

the 5..... the 1st character is always a 1 digit number, followed by 1 to 9 '.'
the 32 is age,
the R is a 1 char location
TITLE can be Dr/Mr/Mrs/Ms

variations include:
- 1 or 2 LASTNAME
- 0 to 2 MIDDLENAME
- FIRSTNAME inludes a period ('.')

i currently have:

(^[1-9]{1})\.{1,9} ([A-Z]\S[A-Z])\s(Dr|Mr|Mrs|Ms)\s([0-9]{1,2}\s([A-Z]{1})

but it's not happening. any ideas?
 
what do u want to parse out of it? do u want only some fields or all the values???

Known is handfull, Unknown is worldfull
 
trying to extract the fields as follows:

5
LASTNAME LASTNAME
FISTNAME MIDDLENAME
TITLE
AGE
LOCATION
 
o.k here is a script that i wrote in JS:
Code:
<script>
str="5..... LASTNAME LASTNAME, FIRSTNAME MIDDLENAME          TITLE  32     R"

TheNumber=str.replace(/^(\d+).*/,"$1")
TheLastName=str.replace(/^.*\.(.*),.*/,"$1")
TheFristMiddleName=str.replace(/^.*,(.*) {10}.*/,"$1")

TheRe=new RegExp(".*"+TheFristMiddleName+" {10}([^\\s]*)\\s+.*")
TheTitle=str.replace(TheRe,"$1")

TheAge=str.replace(/.*\s+(\d+).*$/,"$1")

TheLocation=str.replace(/.*\s+(.*)$/,"$1")

document.write("TheNumber - "+TheNumber+"<br>")
document.write("TheLastName - "+TheLastName+"<br>")
document.write("TheFristMiddleName - "+TheFristMiddleName+"<br>")
document.write("TheTitle - "+TheTitle+"<br>")
document.write("TheAge - "+TheAge+"<br>")
document.write("TheLocation - "+TheLocation)
</script>


this is meant only ofr test purposes, try changing the values of str and see if the anwers are correct, if they are then i will give u one in PHP...

Known is handfull, Unknown is worldfull
 
this is great ... almost! works for about 80%.

the problem is the the name middle name and title pattern.

- there can actually be 1,2 or even 3 names
- the whitespace between the last of the names and the title varies (not always 10)

i'm now removing all the excess whitespace, so the string to parse now looks more like this:

"5..... LASTNAME LASTNAME, FIRSTNAME MIDDLENAME MIDDLENAME2 TITLE 32 R"

the problem i'm having is knowing where the title begins. i know the title is always 3 characters, and i know it is always Mr./Mrs/Ms./Dr.

the logic i'm trying to use to parse the first & middlenames is "grab everything from the ',' until we find (Mr. OR Mrs OR Dr. OR Ms.)"

i'm trying ^.*,.* (Mr\.|Mrs|Ms\.|Dr\.) but no luck :-(

 
thanks vbkris, with your help i worked it out.

solution

Code:
<script>
[COLOR=green]//orginal string[/color]
str1="5..... LASTNAME LASTNAME, FIRSTNAME MIDDLENAME NAME2        DR.    32    R"

[COLOR=green]//remove excess whitespace[/color]
str=str1.replace(/\s+/," ")

TheNumber=str.replace(/^(\d+).*/,"$1")
TheLastName=str.replace(/^.*\.(.*),.*/,"$1")
TheFristMiddleName=str.replace(/^.*\.(.*),(.*) (MR\.|MRS|DR\.|MS\.).*/,"$2")

[COLOR=green]//TheRe=new RegExp(".*"+TheFristMiddleName+" (MR\.|MRS|DR\.|MS\.)([^\\s]*)\\s+.*")
//TheTitle=str.replace(TheRe,"$1")
[/color]
TheTitle=str.replace(/^.*\.(.*),(.*) (MR\.|MRS|DR\.|MS\.).*/,"$3")
TheAge=str.replace(/.*\s+(\d+).*$/,"$1")
TheLocation=str.replace(/.*\s+(.*)$/,"$1")

document.write("String - "+str1+"<br>")
document.write("Clean String - "+str+"<br>")
document.write("TheNumber - "+TheNumber+"<br>")
document.write("TheLastName - "+TheLastName+"<br>")
document.write("TheFristMiddleName - "+TheFristMiddleName+"<br>")
document.write("TheTitle - "+TheTitle+"<br>")
document.write("TheAge - "+TheAge+"<br>")
document.write("TheLocation - "+TheLocation)
</script>
 
no problem, hasve u converted that into PHP???

Known is handfull, Unknown is worldfull
 
yep, converted to php, 1 line of code with
preg_match($pattern, $line, $match_array)

(a few more lines with the loop around it and the other pattern matches)

my favorite bit, the echo at the end of the function that returns:
.... cleaning .... 24396 records processed .... 1.1162s

thx again vbkris
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top