Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing non-ascii from large .txt file 1

Status
Not open for further replies.

clayw584

IS-IT--Management
Jun 13, 2015
4
US
I'm a awk virgin, however a system we have in place already uses this program with other scripts. I'm needing a command to remove all non-ascii characters from a 180mb .txt file.

One of the commands is this:

gawk -f scpfiles\Test.scp TEST.TXT > TEST.list

Which points to this script file:

{
plate=substr($0,1,8)
rest=substr($0,9,length($0)-8)
gsub(/ */," ",rest)
gsub(/ ,/,",",rest)
rest=substr(rest,1,91)
printf"%s%s\n", plate,rest
}


I'd like a simple command if possible, but either way I'd be very appreciative for any assistance. Thanks!


EDIT:

Okay, I just realized gawk.exe and the above script is used to remove certain information from the file, I still need something using awk.exe or if possible gawk.exe to remove non-ascii characters. Sorry in advance for being a dumbass.
 
Something like this ?
gsub(/[^ -~]/,"",$0)

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I created a .scp file and ran it, it appears to run the script but the output file is empty. Here's what I've got

gawk -f scpfiles\nonascii.scp file1.txt > file2.txt

in the scp file I've got

{
gsub(/[^ -~]/,"",$0)
}


The txt file I'm working with has 2.3 million lines.
 
{
gsub(/[^ -~]/,"",$0)[!];print[/!]
}


Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thanks again for replying. When I run the above script the output file is this:

#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , / ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,
#, , , ,

My data is missing (numbers, letters etc) I just wanted to remove the non-ascii characters and leave the numbers and letters, etc.
 
I finally figured it out, this may help someone in the future so I'm posting what I came up with.

tr -cd '\11\12\15\40-\176' <Filebefore.txt > Fileafter.txt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top