Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove many carriage returns 5

Status
Not open for further replies.

Quintios

Technical User
Mar 7, 2002
482
US
I searched but the other two threads on the same subject didn't really fit.

I have various bits of text that are encapsulated in parens, like this (xxx). I want to get them each on their own lines, like this:

(xxx)
(xxx)
(xxx)

The problem is, the file looks like this:

(xxx)(xxx)
(xxx)
(xxx)(xxx)

I've tried this code:
Code:
Dim RegX, SearchPattern, ReplacedText, ReplaceString
Set RegX = NEW RegExp

SearchPattern = "\)"
ReplaceString = ")" & vbCrLf
RegX.Pattern = SearchPattern
RegX.Global = True
RegX.IgnoreCase = True
CurrentLine = RegX.Replace(CurrentLine, ReplaceString)

But that gives me:

(xxx)

(xxx)
(xxx)

(xxx)... and so on...

The ascii code is 0A 0D 0A 0D when the extra space shows up, but I have been unable as of yet to replace the two carriage returns with just one. I've tried using ASCII and I've tried using the "\n" character, but to no avail. Truthfully, I am not very familiar with RegExp.Replace and I'm probably using the wrong syntax, so if you know of the correct syntax I'd love to get another opinion.

Yes, one solution is to close the file, open it back up, and copy every non-blank line into a new file, but, well, yuk. I should be able to accomplish this without doing that using the RegExp object.

Any ideas?


Onwards,

Q-
 
have you tried to use the match and not replace

____________________________________________________
Python????? The other programming language you never thought of!
thread333-584700

onpnt2.gif
 
In the code I listed, there is a match pattern. When a match is found, then the replace should take place. So far I've been unable to get a match to work, or it turns it into an odd file with the black block characters (visible in Notepad).

Do you know what code to use to take

(xxx)(xxx)
(xxx)
(xxx)(xxx)

and turn it into

(xxx)
(xxx)
(xxx)
(xxx)
(xxx)

?????



Onwards,

Q-
 
Have you tried just

Replace(TextFromFile, ")", ")" & VbCrLf)
 
the problem with spazman's Replace(TextFromFile, ")", ")" & VbCrLf) is that it will match ")" even when there is already a new line right after it - and insert another one.

how about this:
Code:
strFixedText = Replace(TextFromFile, ")(", ")" & VbCrLf & "(")

that should work for you :)


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
Well, I guess I should have been more specific. I'm sorry.

The elements are actually like this:

AI(XXX)DI(XXX)
AC(XXX)
DC(XXX)

But actually, the first character will always be an A or a D, so I could run it through twice (it runs pretty quick even on a PentiumII 400) with the pattern set to:

)A and
)D
to be
) vbcrlf A
) vbCrLf D

I'll check and see if it works. Good suggestions! Stars for ya'll!

Onwards,

Q-
 
if you wanted to generalize it - ie, not hardcode )A and )D you will probably have to use regular expression matching as mentioned above..

Just use a reg exp pattern to find [close bracket NOT followed by carriage return] and replace them with [close bracket carriage return]. I know its possible with reg exps, but i'm afraid the correct syntax escapes me, hence my [] natural language notation :) good luck


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
I may have to do that. However, I got an error when attempting to use:

ReplaceString = ")" & vbCrLf & "A"

Can't see a syntax error there. The error is:

Microsoft VBScript runtime error: Syntax error in regular expression

and it errors on this line:

CurrentLine = RegX.Replace(CurrentLine, ReplaceString)

Oh well. I've been copying the entire file from one tmp file to another and removing the empty lines. Completely inefficient, but nonetheless it works. I'm getting a bit better with the RegExp pattern stuff, but the "\n" character still escapes me...

Onwards,

Q-
 
Dang it! Forgot to escape the parens:

")A"

needs to be

"\)A"

Now it works!

Onwards,

Q-
 
if you are only matching )A and )D you can just use replace twice instead of a reg exp - much simpler:
[tt]
strFixedText = Replace(Replace(TextFromFile, ")A", ")" & VbCrLf & "A"), ")D", ")" & VbCrLf & "D")
[/tt]

In an effort to get up to speed with reg exps i'm trying to get my earlier natural language thing to work with a reg exp .. I can get it to match all the ) where there is NOT a new line after it, but when I replace them to include the newline its eating the following character.. :(

Let you know if I succeed.


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
ok, I think I have it working with reg exps now...

(this was done in an ASP page, but you should be able to see what its doing):
Code:
	Dim RegX, mtch, SearchPattern, strReplaceWith, strFileContents
	
	strFileContents = "AB(xxxx)DA(xxx)" & vbCrLf & "ADDA(xx)" & vbCrLf & "D(x)"
	Set RegX = NEW RegExp
	SearchPattern = "\)([^\r\n])"
	strReplaceWith = ")" & vbCrLf & "$1"
	RegX.Pattern = SearchPattern
	RegX.Global = True
	RegX.IgnoreCase = True
	
	Response.Write(strFileContents & "->")
	strFileContents = RegX.Replace(strFileContents, strReplaceWith)
	Response.Write(strFileContents)

the extra character that was being 'eaten' is being put back in with that $1, as I understand it :p

let me know how it works for you..


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
Man, that works like a charm! That took out about 60 lines of repetitive code; nice.

How did you come up with that? What do all the characters mean? I have the reference paper here from the WSH documentation.

\) -> escapes the parens
( -> starts a pattern sequence
[ -> no idea why that is there
^ -> Matches the beginning of input?
\r -> Carriage Return
\n -> linefeed
] -> well, you had one at the beginning, guess you gotta close
) -> closing parens

I just don't understand...

Now the only problem I'm having is that sometimes there are two or three carriage returns in a row. I would imagine that I could replace all of them like this:

SearchPattern = "([^\r\n]+)"
ReplaceString = vbCrLf

I'll check and see if that'll work...

Nope. All that did was erase everything..

I wish I could give you more than one star. I'm almost there! Just one more little trick and I've got it.

Any ideas how to remove more than one CRLF in a row?

Onwards,

Q-
 
Try this:
Code:
SearchPattern = "([\r\n]+)"
ReplaceString = vbCrLf
The ^ just after a [ mean : NOT the next characters (until the ].

Hope This Help
PH.
 
Your explanation is helpful! But for some reason I'm still getting blank lines. I'm not too sure what's going on, but I'm sure that I'll eventually figure it out.

Thank you!

Onwards,

Q-
 
try this then

TextFromFile = Replace(Replace(TextFromFile, vbCrLf, ""), ")", ")" & VbCrLf)
 
Code:
TextFromFile = Replace(Replace(TextFromFile, vbCrLf, ""), ")", ")" & VbCrLf)

Just shows its always worth getting a new pair of eyes to look at your problem :)

Nice solution


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
and to explain my Reg Exp search pattern (as I understand it!)

[tt]SearchPattern = "\)([^\r\n])"[/tt]

[tt]\)[/tt] : matches a close parenthesis )
[tt]( )[/tt] : saves whatever matches inside these 2 parenthesis for later reference ($1 in my replace) - this was what was being eaten in my earlier attempts
[tt][ ][/tt] : this means match one of the characters inside these square brackets - in my case \n or \r (CR or LF)
[tt]^[/tt] : this was to change the above to NOT match one of the two chars

I get very confused by Reg Exp pattern syntax, so the above was found by .. quite a lot of trial & error.

Enjoy


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
Actually, I got a nice solution from *another* website.

Read the entire file into one string variable. Remove *all* vbCrLf's. Then for every ), replace it with ) & vbCrLf.

Works like a champ. Course, if its a large file there could be issues.

Lots of ways to do this. Thank you all!

Onwards,

Q-
 
just to clarify Quintios - that is exactly what spazman's final post does.

[tt]TextFromFile = Replace(Replace(TextFromFile, vbCrLf, ""), ")", ")" & VbCrLf)[/tt]

The inner (green) replace removes all vbCrLf's, then the outer one adds a vbCrLf after every ).

:) Actually i'm surprised we didn't think of it earlier


Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
All you guys rule. Thanks for your help!

Onwards,

Q-
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top