Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data massage/torture with Ruby for Excel

Status
Not open for further replies.

Trevoke

Programmer
Jun 6, 2002
1,142
US
I have a SQL table with:
Date | Name | Procedure | Count

Date and Name can be repeated ad nauseam, until we run out of procedures to be counted.
I want to separate it on a spreadsheet so that each separate sheet is a 'Name'.
I want one line per date, and I want the procedures to become column headings, with the count for each under that column for that day.

There may be an easy way to do this but I've been looking at this too long (I had dumb problems with the spreadsheet gem at first).
Of course, if the answer is "Do it in SQL", then.. er.. how do I do it in SQL? ;-)

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
If I understand you right, you want for a given example table
Code:
Date     | Name | Procedure | Count |
-------------------------------------
20090101 |name1 | proc1     | 1     |
20090101 |name2 | proc1     | 3     |
20090101 |name2 | proc2     | 2     |
20090624 |name1 | proc1     | 1     |
20090624 |name1 | proc2     | 2     |
20090624 |name1 | proc3     | 3     |
20090624 |name2 | proc2     | 4     |
create 2 tables:
Code:
name1:

Date     |proc1| proc2 | proc3 |
--------------------------------
20090101 | 1   | 0     | 0     |
20090624 | 1   | 2     | 3     |
and
Code:
name2:

Date     |proc1| proc2 | 
------------------------
20090101 | 3   | 2     | 
20090624 | 0   | 4     |

Should every table contains the maximal number of columns (i.e. for all procedures), or only the number of columns, the given user really used (as in above example).

For solving above example I would use CSV-format, because it's text file, which could be processed with every scripting language. And CSV could be used in Excel.

First I would export the master table into CSV-text-file and then process the file line by line.
I would use hashes and/or lists as a data structure, e.g for the above example somthing like this
Code:
hash = {
  'name1' => {20090101 = > {'proc1' => 1, 'proc2' => 0, 'proc3' => 0 },
              20090624 = > {'proc1' => 1, 'proc2' => 2, 'proc3' => 3 }
             },
  'name2' => {20090101 = > {'proc1' => 3, 'proc2' => 2 },
              20090624 = > {'proc1' => 0, 'proc2' => 4 }
             }
}
or something simpler...
 
Hmm..

I see - no easy solution besides parse line by line and then sort the data according to how you want it.

Thanks! :)

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
Trevoke said:
I see - no easy solution besides parse line by line and then sort the data according to how you want it.
Hi Trevoke,
It's not so difficult, how it looks at the first sight.
:)

Here is an example I created today evening:

split_tables.rb
Code:
[COLOR=#a020f0]require[/color] [COLOR=#6a5acd]'[/color][COLOR=#ff00ff]yaml[/color][COLOR=#6a5acd]'[/color]

[COLOR=#a020f0]def[/color] [COLOR=#008080]process_file_and_create_master_hash[/color]
  hash = {}
  [COLOR=#0000ff]# open file[/color]
  f = [COLOR=#2e8b57][b]File[/b][/color].open([COLOR=#6a5acd]"[/color][COLOR=#ff00ff]master_table.csv[/color][COLOR=#6a5acd]"[/color], [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]r[/color][COLOR=#6a5acd]"[/color])
  header = f.gets
  [COLOR=#0000ff]# process the master file line by line and create hash[/color]
  [COLOR=#804040][b]while[/b][/color] line=f.gets
    [COLOR=#0000ff]#print line[/color]
    [COLOR=#0000ff]# split line into list[/color]
    line_list=line.chomp.split([COLOR=#6a5acd]'[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]'[/color])
    [COLOR=#0000ff]# extact fields[/color]
    date = line_list[[COLOR=#ff00ff]0[/color]].strip
    name = line_list[[COLOR=#ff00ff]1[/color]].strip
    procedure = line_list[[COLOR=#ff00ff]2[/color]].strip
    count = line_list[[COLOR=#ff00ff]3[/color]].strip
    [COLOR=#0000ff]# create hash[/color]
    [COLOR=#804040][b]if[/b][/color] !hash.keys.include?(name)
      hash[name] = {}
    [COLOR=#804040][b]end[/b][/color]  
    [COLOR=#804040][b]if[/b][/color] !hash[name].keys.include?(date)
      hash[name][date] = {}
    [COLOR=#804040][b]end[/b][/color]
    [COLOR=#804040][b]if[/b][/color] !hash[name][date].keys.include?(procedure)
      hash[name][date][procedure] = count
    [COLOR=#804040][b]else[/b][/color]
      hash[name][date][procedure] += count
    [COLOR=#804040][b]end[/b][/color]
  [COLOR=#804040][b]end[/b][/color]
  [COLOR=#0000ff]# close file[/color]
  f.close
  [COLOR=#0000ff]# return data structure[/color]
  [COLOR=#804040][b]return[/b][/color] hash
[COLOR=#a020f0]end[/color]

[COLOR=#a020f0]def[/color] [COLOR=#008080]create_procedure_hash_from_master_hash[/color](hash)
  prc_hash = {}
  hash.keys.sort.each { |[COLOR=#008080]name[/color]|
    [COLOR=#804040][b]if[/b][/color] !prc_hash.keys.include?(name)
      prc_hash[name] = []
    [COLOR=#804040][b]end[/b][/color]
    hash[name].keys.sort.each { |[COLOR=#008080]date[/color]|
      hash[name][date].keys.sort.each { |[COLOR=#008080]prc[/color]|
        [COLOR=#804040][b]if[/b][/color] !prc_hash[name].include?(prc)
          [COLOR=#0000ff]# push in list[/color]
          prc_hash[name] << prc
        [COLOR=#804040][b]end[/b][/color]
      }
    }
  }
  [COLOR=#0000ff]# return procedure hash[/color]
  [COLOR=#804040][b]return[/b][/color] prc_hash
[COLOR=#a020f0]end[/color]

[COLOR=#a020f0]def[/color] [COLOR=#008080]create_name_files[/color](data_hash, prc_hash)
  [COLOR=#804040][b]for[/b][/color] name [COLOR=#804040][b]in[/b][/color] prc_hash.keys.sort
    f = [COLOR=#2e8b57][b]File[/b][/color].open([COLOR=#6a5acd]"[/color][COLOR=#6a5acd]#{[/color]name[COLOR=#6a5acd]}[/color][COLOR=#ff00ff].csv[/color][COLOR=#6a5acd]"[/color], [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]w[/color][COLOR=#6a5acd]"[/color])
    [COLOR=#0000ff]# write header[/color]
    prc_cols = prc_hash[name].join([COLOR=#6a5acd]'[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]'[/color])
    f.puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Date;[/color][COLOR=#6a5acd]#{[/color]prc_cols[COLOR=#6a5acd]}[/color][COLOR=#6a5acd]"[/color]
    [COLOR=#0000ff]# write other lines[/color]
    [COLOR=#804040][b]for[/b][/color] date [COLOR=#804040][b]in[/b][/color] data_hash[name].keys.sort
      line_list = [date]
      [COLOR=#804040][b]for[/b][/color] prc [COLOR=#804040][b]in[/b][/color] prc_hash[name]
        [COLOR=#804040][b]if[/b][/color] data_hash[name][date].keys.include?(prc)
          line_list << data_hash[name][date][prc]
        [COLOR=#804040][b]else[/b][/color]
          line_list << [COLOR=#ff00ff]0[/color]
        [COLOR=#804040][b]end[/b][/color]
      [COLOR=#804040][b]end[/b][/color]
      [COLOR=#0000ff]#puts line_list.inspect [/color]
      [COLOR=#0000ff]# create line from list[/color]
      line = line_list.join([COLOR=#6a5acd]'[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]'[/color])
      [COLOR=#0000ff]# write line to the file[/color]
      f.puts line
    [COLOR=#804040][b]end[/b][/color]   
    f.close
  [COLOR=#804040][b]end[/b][/color]
[COLOR=#a020f0]end[/color]

[COLOR=#0000ff]# main program[/color]
[COLOR=#804040][b]if[/b][/color] [COLOR=#ff00ff]__FILE__[/color] == [COLOR=#008080]$0[/color]
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Reading file and creatinfg hash...[/color][COLOR=#6a5acd]"[/color]
    data_hash = process_file_and_create_master_hash
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Done.[/color][COLOR=#6a5acd]"[/color]
    puts
    [COLOR=#0000ff]# print hash[/color]
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]data_hash:[/color][COLOR=#6a5acd]"[/color]
    puts data_hash.inspect
    [COLOR=#0000ff]# YAML print[/color]
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]in YAML print:[/color][COLOR=#6a5acd]"[/color]
    puts data_hash.to_yaml
    puts
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Now creating the hash of used prodedures fo every name...[/color][COLOR=#6a5acd]"[/color]
    procedure_hash = create_procedure_hash_from_master_hash(data_hash)
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Done.[/color][COLOR=#6a5acd]"[/color]
    [COLOR=#0000ff]# print hash[/color]
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]procedure_hash:[/color][COLOR=#6a5acd]"[/color]
    puts procedure_hash.inspect
    [COLOR=#0000ff]# YAML print[/color]
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]in YAML print:[/color][COLOR=#6a5acd]"[/color]
    puts procedure_hash.to_yaml
    puts
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Now writing file for every name:[/color][COLOR=#6a5acd]"[/color]
    create_name_files(data_hash, procedure_hash)
    puts [COLOR=#6a5acd]"[/color][COLOR=#ff00ff]Done.[/color][COLOR=#6a5acd]"[/color]
[COLOR=#804040][b]end[/b][/color]

Consider you have an CSV-file, you created from your table:
master_table.csv
Code:
Date;Name;Procedure;Count;
20090101;name1;proc1;  1
20090101;name2;proc1;  3
20090101;name2;proc2;  2
20090624;name1;proc1;  1
20090624;name1;proc2;  2
20090624;name1;proc3;  3
20090624;name2;proc2;  4
Now after running the above script
Code:
C:\Users\Roman\Work>ruby split_tables.rb
Reading file and creatinfg hash...
Done.

data_hash:
{"name1"=>{"20090624"=>{"proc3"=>"3", "proc1"=>"1", "proc2"=>"2"}, "20090101"=>{
"proc1"=>"1"}}, "name2"=>{"20090624"=>{"proc2"=>"4"}, "20090101"=>{"proc1"=>"3",
 "proc2"=>"2"}}}
in YAML print:
---
name1:
  "20090624":
    proc3: "3"
    proc1: "1"
    proc2: "2"
  "20090101":
    proc1: "1"
name2:
  "20090624":
    proc2: "4"
  "20090101":
    proc1: "3"
    proc2: "2"

Now creating the hash of used prodedures fo every name...
Done.
procedure_hash:
{"name1"=>["proc1", "proc2", "proc3"], "name2"=>["proc1", "proc2"]}
in YAML print:
---
name1:
- proc1
- proc2
- proc3
name2:
- proc1
- proc2

Now writing file for every name:
Done.
you get as result 2 files:

name1.csv
Code:
Date;proc1;proc2;proc3
20090101;1;0;0
20090624;1;2;3
name2.csv
Code:
Date;proc1;proc2
20090101;3;2
20090624;0;4

Is this what you desired?

You can open the CSV-files in Excel to see better what they contain.

To see better how the program works, you can uncomment some printouts (puts).
For variables and function names I used long COBOL-like-names which are self-explanatory.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top