Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expressions

Status
Not open for further replies.

mainmast

Programmer
Jun 26, 2003
176
US
Hi all,

I have a text file with several hundred lines. I need to go through each line and see if it follows the following pattern:

*,*,*,"*, *"

Where the asterisk represents any character besides spaces and quotation marks. I think I can do this with regular expressions?

Example:
Good: Smith,John,smithj,"Smith, John"
Bad: Smith, John,smithj,"Smith, John"
Bad: Sm ith,John,smithj,"Smith, John"

I also need to tack on a string to the ends of each line in the text file. Any ideas?

Thanks!
Brandon
 
Sure,

That's not such a difficult pattern.
I would try something like this:
Code:
<%
 Set regx = New RegExp
 regx.Pattern = "[^ ""]+,[^ ""]+,[^ ""]+,""[^ ""]+, [^ ""]+"""
 Source = "Good: Smith,John,smithj,""Smith, John""" &_
          "Bad: Smith, John,smithj,""Smith, John""" &_
          "Bad: Sm ith,John,smithj,""Smith, John"
 Set matchs = regx.Execute(Source)
 for each match in matchs
   response.write match
 next
%>

I hope that helps. Feel free to ask for explanations of any part you don't understand.


Travis Hawkins
BeachBum Software
 
A note or two on the detail of the proposed pattern.
[1] It is greedy when the excluding character set does not include the comma. Hence the proposed pattern:
[tt] regx.Pattern = "[^ ""]+,[^ ""]+,[^ ""]+,""[^ ""]+, [^ ""]+"""[/tt]
would validate as well this (vbs) literal string
[tt] s="Smith,John,smithj,smithj2,smithj3,""Smith, John"""[/tt]
etc. which should not.
[2] If the data is read line by line, then maybe adding matches on the beginning and the end will further strengthen the pattern.

Hence, to strengthen the pattern either add the comma to the excluding character set like this.
[tt] regx.Pattern = "[red]^[/red][^ [red],[/red]""]+,[^ [red],[/red]""]+,[^ [red],[/red]""]+,""[^ [red],[/red]""]+, [^ [red],[/red]""]+""[red]$[/red]"[/tt]
This probably would be acceptable as a bare comma seems likely to be inadmissible in rdn without escaped by a backslash and may be excluded by organization's special naming convention.

Alternative is to use positive admissible character set explicitly. A special case for \w (a-zA-Z0-9_) only is like this.
[tt] regx.Pattern = "^\w+,\w+,\w+,""\w+, \w+""$"[/tt]
You can expand the set by adding more character to \w like [a-zA-Z0-9_-%@] as long as no comma is admitted. If you want to admit a bare comma (,) or a escaped comma in rdn type scheme (\,), then you have to take care of the greediness of pattern and adjust accordingly.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top