Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reg exps help pls!

Status
Not open for further replies.

ianpri

Technical User
Apr 4, 2002
5
GB
Hi all,

i'm a bit of a learner when it comes to reg exps, so a little help would be appreciated..

I have a string : the,cat,sat,"on,the",mat

And I wish to remove all commas between the double quote, then remove the double quote, ending up with:

the,cat,sat,on the,mat

heres what I have so far, but doesn't seem to change anything:

<%@ Language=VBScript %>
<%
strIn = &quot;the,cat,sat,&quot;&quot;on,the&quot;&quot;,mat,&quot;
Set objRE = New RegExp
With objRE
.Global = True
.Pattern = &quot;(&quot;&quot; )(\w+)(,)(\w+)(&quot;&quot; )&quot;
strOut = .Replace(strIn, &quot;$2 $4&quot; )
End With
Set objRE = Nothing
Response.Write strIn & &quot; => &quot; & strOut
%>


thanks in advance,
 
The issue here is the spaces in your regular expression pattern are accepted as part of the pattern. Therefore the string:
the,cat,sat,&quot;on,the&quot;,mat

doesn't have any matches in it. If the string was instead:
the,cat,sat,&quot; on,the&quot; ,mat

than you would get a match and replacement as you were expecting.

If you remove the two spaces from you regular expression it should work as you expected it to.

-Tarwn

01000111 01101111 01110100 00100000 01000011 01101111 01100110 01100110 01100101 01100101 00111111
minilogo.gif alt=tiernok.com
The never-completed website
 
Thanks for the help Tarwn, but I should of used a more real world example. heres one of the CSV entries of data I need to parse:

&quot;R Lenko BDS, LDS, RCS&quot;,43 High St,,,Burnham

and another

Hays Senior Finance,Suite 4,&quot;3rd Floor, Maple House&quot;,95 High St

i've ammended my pattern to &quot;(&quot;&quot;)(\w+)(,)(\w+)(&quot;&quot;)&quot; with works as you suggested with the cat on a mat example, but not for the entries above. Any ideas where i'm going wrong?
 
The problem with the second statement is the spaces again. One way of looking at that would be to look at it like this:
Code:
Hays Senior Finance,Suite 4,&quot;3rd Floor, Maple House&quot;,95 High St
\w \w \w,\w \w,&quot;\w \w. \w \w&quot;,\w
Since your pattern doesn't allow spaces, the spaces inside the quotes cause it not to match. One way to alter your pattern to make it work would be:
Code:
(&quot;&quot;)([\w\s]+)(,)([\w\s]+)(&quot;&quot;)
This now matches &quot;[any combination of words and spaces],[any combination of words and spaces]&quot;

Now the first sample string you gave has another issue, 2 commas inside it. This makes matters more difficult. Since there could be any number of arguments inside the quotes, it is more difficult to strip the comma's out.

Another option, instead of using regular expression replace method, would be to use the match method and then the ASp replace method.

Basically you could make a pattern such as
Code:
&quot;&quot;([\w\s]+,)+([\w\s]+)&quot;&quot;
to find all the matches in a string, then use the replace method to removethe commas and quotes.
Code:
Dim matches, match, temp
matches = objRE.Execute(myStr)
For each match in matches
   myStr = replace(myStr,match,DeQuote(match))
Next

Function DeQuote(aStr)
  DeQuote = Replace(aStr,&quot;&quot;&quot;&quot;,&quot;&quot;)
  DeQuote = Replace(DeQuote,&quot;, &quot;,&quot; &quot;)
  DeQuote = Replace(DeQuote,&quot;,&quot;,&quot; &quot;)
End Function

The function basically replaces and quotes with nothing, and comma-spaces with a space, and any remaining commas with a space. The reason I have two replaces for the commas is to keep you from getting somehting with a double space in the middle.

There is a way to handle the multiple arguments in the regular expression dynamically, but I can't think of any way to do that and also get rid of the internal commas. The closest I got was:
Code:
(&quot;&quot;)([\w\s]+)(,)([\w\s]+(,[\w\s]+)*)(&quot;&quot;)
which of course leaves in any commas after the first.

-Tarwn

01000111 01101111 01110100 00100000 01000011 01101111 01100110 01100110 01100101 01100101 00111111
minilogo.gif alt=tiernok.com
The never-completed website
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top