Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help shape data

Status
Not open for further replies.

zack30

Technical User
Apr 18, 2007
10
DE
Hello;
I have been stuck with this problem for a while.
I have data like this:


id1 marker1 marker2 response

1 CC AG 0
2 CT AA 1
3 CC GG 0
4 TT AA 1

....

the desired output is:

M a b c d e f
marke1 2 0 0 0 1 1
marker2 0 1 1 2 0 0


M is the name if the markers


for marker1
a = number of CC in 0
b= number of CC in 0
c= number of CT in 0

d = number of CC in 1
e= number of CT in 1
f= number of TT in 1

for marker2
a = number of AA in 0
b= number of AG in 0
c= number of GG in 0

d = number of AA in 1
e= number of AG in 1
f= number of GG in 1



In the dataset I have several markers

I tried some proc freq followed by proc transpose but was stuck.

I would appreciate some help
Thanks

Z




 
The part that makes more difficult is the need to have zero's when there isn't a match. For example, the number of CT's in 0. A count for that wouldn't show up in the Proc Freq because that combination isn't in the dataset.

Below is one possible way to get the data reshaped like you are looking for.

Code:
data have;
input id1 marker1 $ marker2 $ response;
datalines;
1 CC AG 0
2 CT AA 1
3 CC GG 0
4 TT AA 1
;
run;

proc sql;
create table want as
select 
	'Marker1' as M,
	sum(case when marker1 = 'CC' and response = 0 then 1 else 0 end) as a,
	sum(case when marker1 = 'CT' and response = 0 then 1 else 0 end) as b,
	sum(case when marker1 = 'TT' and response = 0 then 1 else 0 end) as c,
	sum(case when marker1 = 'CC' and response = 1 then 1 else 0 end) as e,
	sum(case when marker1 = 'CT' and response = 1 then 1 else 0 end) as d,
	sum(case when marker1 = 'TT' and response = 1 then 1 else 0 end) as f
from have
group by 1

Union

select 
	'Marker2' as M,
	sum(case when marker2 = 'AA' and response = 0 then 1 else 0 end) as a,
	sum(case when marker2 = 'AG' and response = 0 then 1 else 0 end) as b,
	sum(case when marker2 = 'GG' and response = 0 then 1 else 0 end) as c,
	sum(case when marker2 = 'AA' and response = 1 then 1 else 0 end) as e,
	sum(case when marker2 = 'AG' and response = 1 then 1 else 0 end) as d,
	sum(case when marker2 = 'GG' and response = 1 then 1 else 0 end) as f
from have
group by 1
;
quit;

This might not be the best method if you have have a lot of marker-to-response combinations, but at least it's a start.

-Dave
 
oops press submit too early, sory:
Hi Dave,
Thanks for the reply.
As you mentioned the difficult part is " The part that makes more difficult is the need to have zero's when there isn't a match. For example, the number of CT's in 0.A count for that wouldn't show up in the Proc Freq because that combination isn't in the dataset."

I guess I will need to work with your program and put it in a macro; get markers and genotpyes (AA) as parameteres? but how to get the 3 different type when they are not in the dataset?
If it is of any help, there are always 3 form (AA AG GG, or CC, CT, TT etc...). Maybe recoding 0 1 2 for all markers?


Thanks anyway that is pointing to the right direction.
Cheers
Z


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top