Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data transformation; problem with newline in field

Status
Not open for further replies.

kb2001

MIS
May 22, 2006
31
0
0
US
Hello-

I have an interesting problem. We are extracting data from one DB (Oracle) and need to insert it into another DB (MSSQL). Data transformation is not possible in this manner. B/c of the situation, we will need to dump the data to a spreadsheet/text file and load it into the MSSQL database. My problem is this. In ODB, the street_address field is one field. In the SDB, the street address field is address1 and address2, if there are two lines of address.

example is:
123 business way
ste. 456

My problem is in transforming this data to two fields. The problem where I think AWK can help me is that the data in the street_address field is separated by a newline (\n). At the end of each row is a dos newline/carriage return (\n\r). I need to change the \n that exists within the street address into a tab (\t) in order to use it as two fields. Is this possible with AWK, is there a better way to do this? I would appreciate some help with this one, or somebody to point me in the right direction to accomplish this task. I am using cygwin at work, but I can always do this at home on a BSD or Linux machine.

Note- trust me when I say this is the only way we are able to move the data from one to another.

data sample

Kevin\t123 cherry way\nApt 21\thouston\tTX\n\r

In the database this is 4 fields (name, street, city, state). This is obviously not how it appears in notepad or anything, just an idea of the tab, newline, and carriage return situation. I have about 2200 rows to move so manually is not an option.

-Kevin
 
Also,

I can pull out this one field and organize the data that way if will be easier,

123 cherry way\nApt 21\n\r234 apple way\napt 23

If that will make it easier
 
This changes any "\n" to "\t" in $2...
Code:
$ od -c file1
0000000   K   e   v   i   n  \t   1   2   3       c   h   e   r   r   y
0000020       w   a   y  [highlight]\n[/highlight]   A   p   t       2   1  \t   h   o   u   s
0000040   t   o   n  \t   T   X  \n  \r
0000050

$ gawk 'BEGIN{ORS=RS="\n\r";OFS=FS="\t"}{gsub("\n","\t",$2);print}' file1|od -c
0000000   K   e   v   i   n  \t   1   2   3       c   h   e   r   r   y
0000020       w   a   y  [highlight]\t[/highlight]   A   p   t       2   1  \t   h   o   u   s
0000040   t   o   n  \t   T   X  \n  \r
0000050
The trouble is, you now have 5 tab-separated fields instead of 4.
 
The trouble is, you now have 5 tab-separated fields instead of 4, or you may have 6 or 7, depending on the address data.

What you would need to do is define an output record with a fixed number of address lines, e.g. assuming a maximum of 3 lines....
Code:
gawk '
  BEGIN {
      ORS = RS = "\n\r"
      OFS = FS = "\t"
  }
  {
    Name = $1
    Addr = $2
    City = $3
    State = $4
    split(Addr, Addr_Line, "\n")
    print Name, Addr_Line[1], Addr_Line[2], Addr_Line[3], City, State
  }
' file1 > file2
 
The trouble is, you now have 5 tab-separated fields instead of 4."

That actually is what I'm seeking. The ODB has the info in 4 fields. The SDB has it in 5 fields, the address is two fields instead of one.

"123 cherry way" should be field address1
"Apt 21" should be field address2

The way it is stored in the ODB has the two address lines in the same field with the \n in the middle. The target database, the SDB has a field for each address line. I will give these a shot tomorrow to see if it does the job, and of course post back to let you know.

Thanks for your help guys, I appreciate it.

Kevin
 
Thanks Ygor-

First script did the trick. It inserted extra newlines at the end of each row ( \r\n\n ) but it got the job done and I cleaned that up with excel

Thanks

-Kevin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top