Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Substitution regex question

Status
Not open for further replies.

rickgerdes

IS-IT--Management
Feb 11, 2003
44
US
Should be simple, I'm just an idiot.

I have a file I'm trying read, no problems there.
I dump it into an array and skim through line by line, no problem there.
I try to match an expression to the line- not _usually_ a problem there.

The issue is, the lines have spaces between each character, so Status: reads as S t a t u s :.
I can't seem to match to the expanded text. I tried a:
$line=s/\s//;
But EVERYTHING goes away, not just white space.

Help?
 
Hi,

if you just want the whitespaces to be removed, use this:

$line=~s/ //;

Greetings

Smash your head on keyboard to continue...
 
I tried that one...no joy.
Best luck I've hand so far is
$line =~ s/\s+//g;

Which works in testing; I dredge this from a search:

$t=" test test ";
print $t;
s/^\s+//, s/\s+$// for $t;
print "\n";
print '|'."$t".'|';

I tweaked it a hair to: $t =~ s/\s+//g; and expanded $t to " t e s t t e s t "; and it works. But on the line's I'm reading in it doesn't.

The files I'm reading are in xml if that helps anyone.
 
upss...sorry

i have forgotten the option g. Have you tried this?

$line=~s/ //g;

I have wrotten a test script, where it functions:

$t=" test tes t";
$t=~s/ //g;
print "-$t-";

Output:-testtest-

Greetings

Smash your head on keyboard to continue...
 
Yeah, did that one too- Here's where I am, thinking wise:

open(BENT, &quot;< path to xml file&quot;) || print &quot;cannot open&quot;;
binmode BENT; #Tried with and w/o this
@belog=<BENT>;
close BENT;
$test = $belog[0];

print $test; #prints fine
print length($test); #83 characters length
$test = s/\s+//g; #should strip all whitespace
print &quot;\nProcessed:\n&quot;.$test.&quot;\nDone.\n&quot;;
print length($test); #length is now zero. ANY s/ regex sets it to 0. Not just whitespace.

So I'm stumped. I'm looking for:

c o m p l e t i o n s t a t u s :

Looking at the file with notepad doesn't show the spaces, but opening it with Perl does...

 
hi,

Caution:

$test = s/\s+//g; #should strip all whitespace

you strip nothing with this expression, because instead of = it should be =~

Smash your head on keyboard to continue...
 
Point. :) Unfortunately, correcting my obvious, glaring error, I now have no extraneous spaces, but I still have spaces between each letter. My test output now reads llike this:
 &#9632;< ? x m l v e r s i o n = &quot; 1 . 0 &quot; e n c o d i n g = &quot; U T F - 1 6 &quot; ? >

83
Processed:
 &#9632;< ? x m l v e r s i o n = &quot; 1 . 0 &quot; e n c o d i n g = &quot; U T F - 1 6 &quot; ? >
Done.
79

There are four characters removed, double spaces only. All singles are still in place. If I drop the \s+ to \s, no change.
 
weird...

you mentioned that you can't see any whitespaces with notepad? Are you sure that there are indeed whitespaces between each letter, or maybe it's just a display problem in the command line? (just a question, afterall -- removed all possible answers, there are only the impossibles left ;-) )

Smash your head on keyboard to continue...
 
I considered just ignoring the whitespace, but the regex for finding a match doesn't catch, ever, regardless of what I try to find.

I'm going to try parsing the file with some of the XML parsing tools...<sigh> now I have to start asking about hashes. :)
 
Have you tried looking at the file in a hex editor? Maybe it's that they're some control characters that perl doesn't recognize as whitespace?
If you don't have hex editor, I'm not sure how to convert a string to hex to view it.. but I think you can do it with the sprintf function.

To match hex values in a regex, precede it with '\x'

this would strip spaces in ascii (hex 20)
s/\x20//g;
 
Hi rickgerdes,

if you still haven't worked around this, maybe you could try this:

$line=~s/[^!-~]//g;

That should remove even unknown control characters.
(If you know what characters may appear in the text, you may even make stricter constraints)

Hope that helps

Smash your head on keyboard to continue...
 
And that did it. Thanks liuwt! Hex and control characters are making my life difficult.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top