Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

removing unwanted characters from a file

Status
Not open for further replies.

bcdixit

Technical User
Nov 11, 2005
64
US
suppose I have a file called sample.sql which I open in a .pl file
the file is something like this when opened in VI (unix editor)

šæselect * from a
where tablename = 'x';<some white space characters>
šæselect * from b
where tablename = 'x';<some white space characters>
šæselect * from c
where tablename = 'x';<some white space characters>
šæselect * from d
where tablename = 'x';<some white space characters>
šæselect * from e
where tablename = 'x';<some white space characters>
šæselect * from f;
where tablename = 'x'<some white space characters>
I want to clean this file.
I want to remove the first two non-word characters.
also each line in the file has a some white space characters after the ';' before the new line starts that I want to remove.

How do I use regular expressions to clean the contents of this and WRITE TO THE SAME FILE.


This file is generated by some other part of the code.

I have tried to use the following to read and write to the same file but doesn't seem to work
open(myfile,"+<$myfile");

also any help with the regular expression will be highly appreciated.

thanks..
bcdixit
 
Dude, I've already given you the solution to this one.
Did you even bother to try it?
Code:
perl -i.bak -lne 's/^..(.*\w)\s*$/$1/;print' $sample.sql


Trojan.
 

where do I put this line of code in my .pl file?

As I had already told you yesterday, I didn't understand
your solution.

Thought I would explain my problem in a better way by starting a new thread.

I am a novice in perl.

Thanks anyway
bcdixit
 
I also offered you two suggestions in the other thread. I see now you may not know how to use Tie::File, but did you try the other suggestion I posted for you?
 
KevinADC,
Yeah the other suggestion you told me works good.
But I am still having trouble building a reg-expression to clean up the file.

Thanks
bcdixit
 
OK, then lets see, try this:

Code:
$tables = '/opt/teradata/results.txt';
open(FH, $tables) or die "$!";#open for reading
chomp(my @data = <FH>);
close FH;

open(FH ">$tables") or die "$!";#open for overwriting
for (@data) {
   s/^šæ//;
   s/\s+$//;
   print FH "$_\n";
}
close FH;
 
the s/^šæ//; works..
but unfortunately when I looked at the input file there are characters other than 'šæ' at the start of each line.
thats where I am finding trouble.

when i give s/^W+//;...it eliminates only one non word charcter from each line.
 
your solution seems to eliminate just one wierd character from start of each line.
ok..is there a way I can eliminate all the wierd characters from start of each line until a word charcter appears..maybe using \b or \B ?

thanks
bcdixit
 
You may find that although you only SEE two characters, they may be made up from more than two BYTES.


Trojan.
 
hmmm.... considering W+ means more than one W I don't see how it works.
 
I am new to all this ..but i think if you put just the W+ ,it memorizes the non-word character. after going through a PERL book i found that if you put the ?:, it won't memorize the character.

 
yes, ?: tells perl not to store the macth inside the parenthesis in memory, but a bare W without a back-slash in front of it is just a W, not a non-word character class: \W
 
bcdixit

it does not matter whether it captures it or not - that is simply an issue of code optimization - if a capture is made and it is memorised, then $1 can be used to refer to that capture. This will not affect what the regex finds.

Unescaped (i.e. no backslash before the uppercase W) your regex s/^(?:W+)//; will remove an unlimited number of capital W's from the beginning of a line - and nothing else. I promise you.

Kind Regards
Duncan
 
i.e. i completely agree with Kevin's comment:

hmmm.... considering W+ means more than one W I don't see how it works.

Kind Regards
Duncan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top