Processing File with return character "\r" in field using AWK 1

lookers · Aug 21, 2008

Hi,
I have a file that contains a set of fields.
The fields are enclosed with quotes ("") and separated by a comma. The problem is that one of the fields is an address field that contains carriage return characters. AWK treats this as a new line.
Any ideas how to handle this. I have tried removing them using

Code:

gsub("\r", "")

and

Code:

gsub(/\r/, "")

but this has not worked.
Am I using the right approach but wrong syntax or how can I solve this. Or can AWK handle this. I am using the comma as the separator

Here is and example of the text
"first", "second" , "no prob" , "forth"
"first", "second" , "third addres

problem

", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

Any suggestions?

feherke · Aug 21, 2008

Hi

Where in your sample data is the [tt]\r[/tt] ? What code handled that sample data ?

Feherke.

http://rootshell.be/~feherke/

lookers · Aug 21, 2008

Thanks for replying

The \r carriage return character is after the addres string in the sample data. It is hard to represent this on the web page. When I open the text in excel there are two square boxes. I have got the integer value of these character and it relates to \r.

the code is

Code:

awk -F "," '{gsub("\r","") print $0 >> output}' input

here is an example of what input is:

"first", "second" , "no prob" , "forth"
"first", "second" , "third addres

problem

", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

What i want

"first", "second" , "no prob" , "forth"
"first", "second" , "third addres problem", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

I could have something to do with the encoding when the file (Unix or windows) was created but I don't control over that.

lookers · Aug 21, 2008

Sorry the code is :

Code:

awk -F "," '{gsub("\r",""); print $0 >> output}' input

I was missing a semi colon

feherke · Aug 21, 2008

Hi

Seems there are [tt]\n[/tt] characters (too). So you have to join the broken lines :

Code:

awk -F "," '{gsub(/\r/,"");[red]while($0!~/"$/&&getline s)$0=$0 s;[/red]print}' /input/file

Tested with [tt]gawk[/tt].

Feherke.

http://rootshell.be/~feherke/

lookers · Aug 21, 2008

That fixed it.
Thanks a lot.
Could you explain the code in red to me. or point me to a good tutorial/book.

Thanks

feherke · Aug 21, 2008

Hi

Code:

while (       [gray]# repeat while the condition is true[/gray]
  $0!~/"$/    [gray]# the current record not ends with " ...[/gray]
  &&          [gray]# ... and ...[/gray]
  getline s   [gray]# ... reading the next record into variable s succeeds[/gray]
)
  $0=$0 s;    [gray]# append the newly read record to the current record[/gray]

Feherke.

http://rootshell.be/~feherke/

lookers · Aug 21, 2008

One more question.
How would I remove the bit that is after the new line. E.g.

Input

"first", "second" , "no prob" , "forth"
"first", "second" , "third addres

problem

", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

output

"first", "second" , "no prob" , "forth"
"first", "second" , "third addres", "forth field"
"first", "second" , "no prob" , "forth"
"first", "second" , "no prob" , "forth"

Thanks for the help

feherke · Aug 21, 2008

Hi

Probably could be done nicer...

Code:

awk -F "," '{gsub(/\r/,"");if($0!~/"$/){while(getline s&&s!~/"$/);$0=$0 s;}print}' /input/file

Feherke.

http://rootshell.be/~feherke/

lookers · Aug 21, 2008

That's great. Your a legend

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Processing File with return character "\r" in field using AWK 1

lookers

Programmer

feherke

Programmer

lookers

Programmer

lookers

Programmer

feherke

Programmer

lookers

Programmer

feherke

Programmer

lookers

Programmer

feherke

Programmer

lookers

Programmer

Similar threads

Part and Inventory Search

Sponsor

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Processing File with return character &quot;\r&quot; in field using AWK 1

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor

Processing File with return character "\r" in field using AWK 1