Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

create matrix from csv data file

Status
Not open for further replies.

diera

Programmer
Mar 21, 2011
28
DE
Hi,

i have data as below:

-DATA-
tweetid, workerid
10115, user1
10115, user2
10190, user1
10190, user2
10193, user3
10320, user2
10320, user1

I have no idea, how i can write a code to transform this data to the 2D matrix.

My desire output is:

tweetid user1, user2, user3
10115 1, 1, 0
10190 1, 1, 0
10193 0, 0, 1
10320 1, 1, 0

Any help is much appreciated. Thank you.
 
I would read the data
Code:
[u][i]tweetid, workerid[/i][/u]
10115,    user1 
10115,    user2 
10190,    user1 
10190,    user2 
10193,    user3 
10320,    user2 
10320,    user1
and create a hash, which keys are tweetid and values are sorted lists of corresponding workerids
Code:
hash = {
  10115 => [user1, user2],
  10190 => [user1, user2],
  10193 => [user3],
  10320 => [user1, user2]
}
Then it's not very difficult to create a matrix you need.
Some time ago I posted similar example in the Ruby forum. You can look at the working code here and use similar approach in Perl for your needs.
 
This will start you off:

Code:
use strict;
my ($tweetid,$workerid,%HoH);
while (<DATA>) {
	chomp;
	($tweetid,$workerid)=split(/,/,$_);
	$HoH{ $tweetid }{ $workerid } = 1;
}

print "tweetid\tuser1,\tuser2,\tuser3\n";
for my $k1 ( sort keys %HoH ) {
     print "$k1 ";
     
     for my $k2 ( sort keys %{$HoH{ $k1 }} ) {
         print "$k2";
	}
	print "\n";
}

__DATA__
10115,    user1
10115,    user2
10190,    user1
10190,    user2
10193,    user3
10320,    user2
10320,    user1

This outputs;

Code:
tweetid user1,  user2,  user3
10115     user1    user2
10190     user1    user2
10193     user3
10320     user1    user2

You've just got to change the user1/2/3 etc to digits and input zeros where they aren't present.
 
Hi all,

thanks for your suggestion. Actually i have make some code before but it seem i can't get the output as i need.

my code so far

Code:
#!/usr/bin/perl 
#use strict;
#use warnings;
use Text::CSV_XS;

# Store our CSV file name
my $file = 'apa2.csv';

open( CSV_XS, '<', $file )
  or die( 'Unable to open csv file ', $file, "\n" );

open MYFILE, ">matrix_user.txt";
select MYFILE;

my $csv = new Text::CSV_XS;
my %alldata=();    #at initialization

my@fields=('tweetid', 'workerid');

#for each line data
#my $index = 0;			

foreach my $line (<CSV_XS>) {
    if ( $csv->parse($line) ) {
        my @columns = $csv->fields();
	
	# filter to skip from processing the headers
	if($columns[0] ne "tweetid"){
        	for ( my $i = 1 ; $i <= $#fields ; $i++ ) {
            		$alldata{ $columns[0]}[$i] = $columns[$i]; 
        	}
	}
    }
    else {
        print 'Unable to parse CSV line: ', $line, "\n";
    }
    $index++;		
}
close(CSV_XS);

#for each column data seperate by ,
for ( my $i = 1 ; $i <= $#fields ; $i++ ) {
    for my $k ( sort keys %alldata ){
	# get tweetId
	my @tweetid = split(',',$k);
              print join( $tweetid[0], ",", $alldata{$k}[$i] ), "\n"
              if $alldata{$k}[$i];
        }
 }

tonykent: i will try to combine your code.

 
Hi tonykent,

i already got the right output. just edit a little bit. Thank you so much.

best regards
diera
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top