Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Processing File with return character "\r" in field using AWK 1

Status
Not open for further replies.

lookers

Programmer
Aug 2, 2005
14
EU
Hi,
I have a file that contains a set of fields.
The fields are enclosed with quotes ("") and separated by a comma. The problem is that one of the fields is an address field that contains carriage return characters. AWK treats this as a new line.
Any ideas how to handle this. I have tried removing them using

Code:
gsub("\r", "")
and
Code:
gsub(/\r/, "")

but this has not worked.
Am I using the right approach but wrong syntax or how can I solve this. Or can AWK handle this. I am using the comma as the separator

Here is and example of the text
"first", "second" , "no prob" , "forth"
"first", "second" , "third addres

problem

", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

Any suggestions?
 
Thanks for replying

The \r carriage return character is after the addres string in the sample data. It is hard to represent this on the web page. When I open the text in excel there are two square boxes. I have got the integer value of these character and it relates to \r.

the code is

Code:
awk -F "," '{gsub("\r","") print $0 >> output}' input


here is an example of what input is:

"first", "second" , "no prob" , "forth"
"first", "second" , "third addres

problem

", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"


What i want


"first", "second" , "no prob" , "forth"
"first", "second" , "third addres problem", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

I could have something to do with the encoding when the file (Unix or windows) was created but I don't control over that.

 
Sorry the code is :

Code:
awk -F "," '{gsub("\r",""); print $0 >> output}' input

I was missing a semi colon
 
Hi

Seems there are [tt]\n[/tt] characters (too). So you have to join the broken lines :
Code:
awk -F "," '{gsub(/\r/,"");[red]while($0!~/"$/&&getline s)$0=$0 s;[/red]print}' /input/file
Tested with [tt]gawk[/tt].

Feherke.
 
That fixed it.
Thanks a lot.
Could you explain the code in red to me. or point me to a good tutorial/book.

Thanks
 
Hi

Code:
while (       [gray]# repeat while the condition is true[/gray]
  $0!~/"$/    [gray]# the current record not ends with " ...[/gray]
  &&          [gray]# ... and ...[/gray]
  getline s   [gray]# ... reading the next record into variable s succeeds[/gray]
)
  $0=$0 s;    [gray]# append the newly read record to the current record[/gray]

Feherke.
 
One more question.
How would I remove the bit that is after the new line. E.g.


Input

"first", "second" , "no prob" , "forth"
"first", "second" , "third addres

problem

", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"


output


"first", "second" , "no prob" , "forth"
"first", "second" , "third addres", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

Thanks for the help
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top