Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to chnage the tabular format 1

Status
Not open for further replies.

blyssz

Technical User
Nov 18, 2008
49
0
0
US
Hello All,
I want to change the format of the dataset:
MY input dataset:

Subject mon name1 name2 name3 Score1 Score2 Score3
Math 01-06Mon Avi6 John6 Becky6 0 107.67 221.67
Math 07-12Mon Avi12 John12 Becky12 97 102 200.46
Math 13-18Mon Avi18 John18 Becky18 55 114 211
Math Tot Avitot Johntot Beckytot 152 324 633
Sci 01-06Mon Matt6 Linda6 110 115
Sci 07-12Mon Matt12 Linda12 212 119
Sci 13-18Mon Matt18 Linda18 101 190
Sci Tot Matttot Lindatot 423 424
Stat 01-06Mon Jim6 Eric6 Anu6 201 111.67 222.45
Stat 07-12Mon Jim12 Eric12 Anu12 100.55 119.25 207
Stat 13-18Mon Jim18 Eric18 Anu18 85.54 115 219
Stat Tot Jimtot Erictot Anutot 388 346 648

I want my output dataset to look like:
01-06Mon 07-12Mon 13-18Mon Total
Math Avi 0 97 55 152
John 108 102 114 324
Becky 222 200 211 633

Sci Matt 110 212 101 423
Linda 115 119 190 424

Stats Jim 201 101 86 388
Eric 112 119 115 346
Anu 222 207 219 648

I tried to use proc tabulate but could not find the way to change the format of my output.
Any suggestion would be helpful.I am using PC sas 9.2.

Thanks in advance,
Blyssz

 
Hi,

I don't think that you are going to be able to achieve what you want using SAS.

A couple of points on your stating dataset

1) Each name and score has a separate variable e.g. name1, score1 which prevents them from being grouped together.
2) The data in the table has unique values for the name variable e.g. Avi6, Avi12, Avi18 which also prevents them from being grouped together.

It looks like your starting point is an output from another piece of work so it might be an idea to see if you can have access to the source data for this which would make your life easier.

For shifting data around in the manner which you have described, I think that you would use the PROC TRANSPOSE function rather than PROC TABULATE.

Sorry for not being able to offer more help.

Regards,

Fat Captain.
 
Thank you FatCaptain for your response.
After using proc transpose, I got the following format

Subject

Math mon 01-06mon 7-12mon 13-18mon Total
Math score1 0 97 55 152
Math score2 107.67 102 114 324
Math score3 221.67 200.46 211 633

Sci mon 01-06mon 7-12mon 13-18mon Total
Sci score1 110 212 101 423
Sci score2 115 119 190 424
Sci score3


Stat mon 01-06mon 7-12mon 13-18mon Total
Stat score1 201 100.55 85.54 388
Stat score2 111.67 119.25 115 346
Stat score3 222.45 207 219 648


I did not use name var in transpose but if that I use label for score1, score2, score3 as name as Avi, john and becky I will get the output in the desired format but Label statement is not working.

I would really appreciate any suggestion.

Thanks,
Blyssz
 
Hi Blyssz,

As you are now experiencing, having data with summary rows is hard to manipulate in SAS.

Good data management practice is to keep totals in columns and not in rows. The other issue with this data is that the name variables are used to keep track of the timepoints - not a good idea.

The code below should get you the data in the format you desire (ready for passing onto proc report).

HTH

Code:
data have ;
   infile cards missover ;
   input (Subject mon name1 name2)(:$8.) @33 name3 $8. Score1-Score3 ;
   array names name: ;
      do over names ;
         names = compress(names,,'ka') ;*Removes numbers from names ;
	     names = prxchange('s/tot\s*$//io',-1,names) ;*Removes tot from the end of names;
	     end ;
cards ;
Math 01-06Mon  Avi6    John6    Becky6      0   107.67    221.67
Math 07-12Mon  Avi12   John12   Becky12    97    102    200.46
Math 13-18Mon  Avi18   John18   Becky18    55    114    211
Math Tot       Avitot  Johntot  Beckytot  152    324    633
Sci  01-06Mon  Matt6   Linda6             110    115    
Sci  07-12Mon  Matt12  Linda12            212    119    
Sci  13-18Mon  Matt18  Linda18            101    190    
Sci  Tot       Matttot Lindatot           423    424    
Stat 01-06Mon  Jim6    Eric6    Anu6      201    111.67    222.45
Stat 07-12Mon  Jim12   Eric12   Anu12     100.55 119.25    207
Stat 13-18Mon  Jim18   Eric18   Anu18     85.54 115    219
Stat Tot       Jimtot  Erictot  Anutot     388    346    648
;;;;

proc transpose data=have out=trans ;
   by subject name: ;
   id mon ;
   var score: ;
   run ;
*Associate the correct name with the correct result ;
data final ;
   do _n_=1 by 1 until(last.subject) ;
      set trans ;
	  length name $10 ;
	  by subject ;
      array names name1-name3 ;
      name = names[_n_] ;
	  if not missing(name) then output ;
	  end ;
   keep subject name _numeric_ ;
   run ;
proc print ;run ;

output:

Code:
 Obs    Subject    01-06Mon    07-12Mon    13-18Mon    Tot    name

                            1      Math         0.00       97.00       55.00     152    Avi
                            2      Math       107.67      102.00      114.00     324    John
                            3      Math       221.67      200.46      211.00     633    Becky
                            4      Sci        110.00      212.00      101.00     423    Matt
                            5      Sci        115.00      119.00      190.00     424    Linda
                            6      Stat       201.00      100.55       85.54     388    Jim
                            7      Stat       111.67      119.25      115.00     346    Eric
                            8      Stat       222.45      207.00      219.00     648    Anu
 
Thanks a lot KDT82 for your response.
I am new to SAS so please bear with me.

The problem is that My Dataset consists of 3600 rows, and I already have dataset in the work directory in the given format.
How Is it possible to create array of names and scores from the existing dataset?

Thanks,
Blyssz
 
Hi blyssz,

The modifiers are a powerful way of specifying ranges. As was demonstrated in the previous example, if all your name variables begin with 'name' then you can use 'name:'. The same applies to score ('score:').

Other modifiers, that may be more appropriate to your data are:

All values between variable 'a' and 'z' (position wise, in the dataset) a--z
All values starting from variable 'a1' to 'a10' a1-a10
All numeric values _numeric_
All character values _character_

In short, it really depends on how your variables are named, and what their positions in the dataset are.

HTH
 
Thanks Kdt82 for your response.

I apologize for not explaining the problem clearly.
Actually I have 39000 recurring rows for subjects, as shown below.

Subject mon name1 name2 name3 Score1 Score2 Score3
Math 01-06 Mon Avi6 John6 Becky6 0 107.67 221.67
Math 07-12 Mon Avi12 John12 Becky 12 97 102 200.46
Math 13-18 Mon Avi18 John18 Becky 18 55 114 211
Math Total Avitot Johntot Beckytot 152 324 633
Sci 01-06 Mon Matt6 Linda6 110 115
Sci 07-12 Mon Matt12 Linda12 212 119
Sci 13-18 Mon Matt18 Linda18 101 190
Sci Total Matttot Lindatot 423 424
Stats 01-06 Mon Jim6 Eric6 Anu6 201 111.67 222.45
Stats 07-12 Mon Jim12 Eric12 Anu12 100.55 119.25 207
Stats 13-18 Mon Jim18 Eric18 Anu18 85.54 115 219
Stats Total Jimtot Erictot Anutot 388 346 648
Math 01-06 Mon Avi6 John6 Becky6 0 100 101
Math 07-12 Mon Avi12 John12 Becky 12 92 102 200.
Math 13-18 Mon Avi18 John18 Becky 18 59 119 205
Math Total Avitot Johntot Beckytot 151 321 506
Sci 01-06 Mon Matt6 Linda6 120 113
Sci 07-12 Mon Matt12 Linda12 208 119
Sci 13-18 Mon Matt18 Linda18 101 190
Sci Total Matttot Lindatot 429 422
Stats 01-06 Mon Jim6 Eric6 Anu6 202 111.67 200
Stats 07-12 Mon Jim12 Eric12 Anu12 100 119.25 207
Stats 13-18 Mon Jim18 Eric18 Anu18 85.54 110 219
Stats Total Jimtot Erictot Anutot 287 341 626


So the problem is that if I try to sort them by Subject then it gives me an error as


Data set _DATA1 is not sorted in ascending sequence. The current BY group has subject = Stats and the next BY group
has subject = Math.

Any suggestion, how can I resolve that problem?

 
Hi Blyssz,

The best solution would be to change the structure in which you store your data into a normalised form - this would at least allow for easier reporting in the long run.

Another peculiarity of your data is you have repeating groups of data (eg. Math) that differ only in the scores - this makes it difficult to distinguish between the groups.

I have slightly modified the code from my first post to add an extra step to make the groups unique. While this should allow you to report the data in your example, it is a poor substitute for following good practice and fixing the data structure.

HTH

Code:
options validvarname = any ;
data have ;
   infile cards missover ;
   input (Subject mon name1 name2)(:$8.) @35 name3 $8. Score1-Score3 ;
   array names name: ;
      do over names ;
         names = compress(names,,'ka') ;*Removes numbers from names ;
         names = prxchange('s/tot\s*$//io',-1,names) ;*Removes tot from the end of names;
         end ;
cards ;
Math    01-06Mon Avi6    John6    Becky6    0      107.67  221.67
Math    07-12Mon Avi12   John12   Becky12   97     102     200.46
Math    13-18Mon Avi18   John18   Becky18   55     114     211
Math    Total    Avitot  Johntot  Beckytot  152    324     633
Sci     01-06Mon Matt6   Linda6             110    115    
Sci     07-12Mon Matt12  Linda12            212    119    
Sci     13-18Mon Matt18  Linda18            101    190    
Sci     Total    Matttot Lindatot           423    424    
Stats   01-06Mon Jim6    Eric6    Anu6      201    111.67  222.45
Stats   07-12Mon Jim12   Eric12   Anu12     100.55 119.25  207
Stats   13-18Mon Jim18   Eric18   Anu18     85.54  115     219
Stats   Total    Jimtot  Erictot  Anutot    388    346     648
Math    01-06Mon Avi6    John6    Becky6    0      100     101
Math    07-12Mon Avi12   John12   Becky12   92     102     200
Math    13-18Mon Avi18   John18   Becky18   59     119     205
Math    Total    Avitot  Johntot  Beckytot  151    321     506
Sci     01-06Mon Matt6   Linda6             120    113    
Sci     07-12Mon Matt12  Linda12            208    119    
Sci     13-18Mon Matt18  Linda18            101    190    
Sci     Total    Matttot Lindatot           429    422    
Stats   01-06Mon Jim6    Eric6    Anu6      202    111.67  200
Stats   07-12Mon Jim12   Eric12   Anu12     100    119.25  207
Stats   13-18Mon Jim18   Eric18   Anu18     85.54  110     219
Stats   Total    Jimtot   Erictot Anutot    287    341     626
;;;;
**Create and ID variable to make groups unique  ;
data stats1V / view=stats1V ;
   set have ;
   by subject notsorted ;
   _id + first.subject ;
   run ;
proc transpose data=stats1V out=trans ;
   by _id subject name: notsorted  ;
   id mon ;
   var score: ;
   run ;
*Associate the correct name with the correct result ;
data final ;
   do _n_=1 by 1 until(last._id) ;
      set trans ;
      length name $10 ;
      by _id ;
      array names name1-name3 ;
      name = names[_n_] ;
      if not missing(name) then output ;
      end ;
   keep subject name _numeric_ ;
   run ;
proc print ;run ;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top