removing unwanted characters from a file

bcdixit · Nov 16, 2005

suppose I have a file called sample.sql which I open in a .pl file
the file is something like this when opened in VI (unix editor)

šæselect * from a
where tablename = 'x';<some white space characters>
šæselect * from b
where tablename = 'x';<some white space characters>
šæselect * from c
where tablename = 'x';<some white space characters>
šæselect * from d
where tablename = 'x';<some white space characters>
šæselect * from e
where tablename = 'x';<some white space characters>
šæselect * from f;
where tablename = 'x'<some white space characters>
I want to clean this file.
I want to remove the first two non-word characters.
also each line in the file has a some white space characters after the ';' before the new line starts that I want to remove.

How do I use regular expressions to clean the contents of this and WRITE TO THE SAME FILE.

This file is generated by some other part of the code.

I have tried to use the following to read and write to the same file but doesn't seem to work
open(myfile,"+<$myfile");

also any help with the regular expression will be highly appreciated.

thanks..
bcdixit

TrojanWarBlade · Nov 17, 2005

Dude, I've already given you the solution to this one.
Did you even bother to try it?

Code:

perl -i.bak -lne 's/^..(.*\w)\s*$/$1/;print' $sample.sql

Trojan.

bcdixit · Nov 17, 2005

where do I put this line of code in my .pl file?

As I had already told you yesterday, I didn't understand
your solution.

Thought I would explain my problem in a better way by starting a new thread.

I am a novice in perl.

Thanks anyway
bcdixit

KevinADC · Nov 17, 2005

I also offered you two suggestions in the other thread. I see now you may not know how to use Tie::File, but did you try the other suggestion I posted for you?

bcdixit · Nov 17, 2005

KevinADC,
Yeah the other suggestion you told me works good.
But I am still having trouble building a reg-expression to clean up the file.

Thanks
bcdixit

KevinADC · Nov 17, 2005

OK, then lets see, try this:

Code:

$tables = '/opt/teradata/results.txt';
open(FH, $tables) or die "$!";#open for reading
chomp(my @data = <FH>);
close FH;

open(FH ">$tables") or die "$!";#open for overwriting
for (@data) {
   s/^šæ//;
   s/\s+$//;
   print FH "$_\n";
}
close FH;

bcdixit · Nov 17, 2005

the s/^šæ//; works..
but unfortunately when I looked at the input file there are characters other than 'šæ' at the start of each line.
thats where I am finding trouble.

when i give s/^W+//;...it eliminates only one non word charcter from each line.

duncdude · Nov 17, 2005

Code:

s/^\W{2}//;

Kind Regards
Duncan

bcdixit · Nov 17, 2005

your solution seems to eliminate just one wierd character from start of each line.
ok..is there a way I can eliminate all the wierd characters from start of each line until a word charcter appears..maybe using \b or \B ?

thanks
bcdixit

TrojanWarBlade · Nov 17, 2005

You may find that although you only SEE two characters, they may be made up from more than two BYTES.

Trojan.

bcdixit · Nov 17, 2005

I tried
s/^(?:W+)//;
and it worked..

KevinADC · Nov 17, 2005

hmmm.... considering W+ means more than one W I don't see how it works.

bcdixit · Nov 17, 2005

I am new to all this ..but i think if you put just the W+ ,it memorizes the non-word character. after going through a PERL book i found that if you put the ?:, it won't memorize the character.

KevinADC · Nov 17, 2005

yes, ?: tells perl not to store the macth inside the parenthesis in memory, but a bare W without a back-slash in front of it is just a W, not a non-word character class: \W

duncdude · Nov 17, 2005

bcdixit

it does not matter whether it captures it or not - that is simply an issue of code optimization - if a capture is made and it is memorised, then $1 can be used to refer to that capture. This will not affect what the regex finds.

Unescaped (i.e. no backslash before the uppercase W) your regex s/^(?:W+)//; will remove an unlimited number of capital W's from the beginning of a line - and nothing else. I promise you.

Kind Regards
Duncan

duncdude · Nov 17, 2005

i.e. i completely agree with Kevin's comment:

hmmm.... considering W+ means more than one W I don't see how it works.

Kind Regards
Duncan

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

removing unwanted characters from a file

bcdixit

Technical User

TrojanWarBlade

Programmer

bcdixit

Technical User

KevinADC

Technical User

bcdixit

Technical User

KevinADC

Technical User

bcdixit

Technical User

duncdude

Programmer

bcdixit

Technical User

TrojanWarBlade

Programmer

bcdixit

Technical User

KevinADC

Technical User

bcdixit

Technical User

KevinADC

Technical User

duncdude

Programmer

duncdude

Programmer

Similar threads

Part and Inventory Search

Sponsor