Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Joining multiple lines 1

Status
Not open for further replies.

GoBillsBN

Technical User
Jul 3, 2001
2
US
Can anybody help me with a program to join lines together? Here's kinda what I have:

>(14-3-3)
CCACGCGTCCGCCTTGG
GCTGTCTTTGTATGACT
CTGGTCCACAATCCCTT
>(6CKine)
CTTGTCCTGGTCCTGGC
TCAGGTACAGCCGAAAG
TTGCGCTATGCCAGCTA

....and so on. Here's what I want it to look like:

>(14-3-3)
CCACGCGTCCGCCTTGGGCTGTCTTTGTATGACTCTGGTCCACAATCCCTT
>(6CKine)
CTTGTCCTGGTCCTGGCTCAGGTACAGCCGAAAGTTGCGCTATGCCAGCTA

Basically I want to join all the lines with the DNA sequence into one line.

I couldn't think of anyway to accomplish this....but I was just introduced to awk this week....so let me know if anyone has any ideas.

Thanks!
 
Maybe some of the other guys can help you more, but this is very basic.
Assuming that your line starts with either
a ">" or a parentheses "(" , then you could
write this.
awk ' {
if ($0 ~ /^\(/) {
++s_lines
} else {
++code
}
arr[s_line] = $1
arr
Code:
 = $2 
(The reason I made this an array variable is that I was thinking you could manipulate the data more easily this way later, but you don't have to)
for now...
printf "%s" , arr[s_line] , arr[code]
}' file 

This will give you the ()followed by code and then the next()and the intervening code.
(hopefully, it did for me.)
 
What happened there?
arr
Code:
, #2 var.
 
Hello, GoBillsBN!

You can try this awk solution:


/^>/ { # finds line with > at the beginning of the line
if (dnaRow != "") {
print dnaRow
dnaRow = ""
}
print # prints line with >
next # skips to the next line
}

{ dnaRow = dnaRow $0 }

END { print dnaRow }


Congratulations, GoBillsBN! Awk is a good choice.

Bye!

KP.
 
You could try this, which is simple and makes use of nawks pattern matching capability.

nawk '/[CGTA]$/ {printf("%s",$0)} /^\>/ {printf("\n%s\n",$0)} END {print}' dnafile

Greg.
 
Thanks for the help everyone, all of your suggestions seem to work great for me!

Awk was definently a good choice for me for these kind of uses. I'm still learning though, so can anyone suggest any helpful websites or books to learn awk from?
 
Hi again, GoBillsBN!

Please, see this answer:
good awk/sed reference book needed

Also you can check O'Reilly's site. There is a good awk reference text, "Chapter 11 The awk Programming Language:"


My awk referrence is an old overview article by the original authors of Awk, because I use classic awk; see this site:


Awk is beatiful and simple scripting language.

Bye!

KP.
 
Hi GoBillsBN,

Here is my two cents worth!


awk '

/^>\(/ {

if ( flag ) printf ("\n")
print
flag = 1
}

/^[A-Z]+$/ { printf ("%s", $0 ) }

END { printf ("\n") }' inputfile > outputfile



flogrr
flogr@yahoo.com

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top