I'm trying to parse some data for my thesis but it's not functional. I'm not very fluent in Perl so any help would be appreciated!
Before I wrote this, of course I wrote a pseudocode to help me through the process.
But this seems to not work.
Here's the snippet of the data file:
Thank you so much for your help.
#!/usr/bin/perl -w
#use strict;
#use encoding 'cp437';
# $inputfilename=@ARGV;
open(INPUTFILE, "<data"); # Open the data file for reading
#$outputFilename = $inputfilename . '_processed.txt';
#open(OUTPUTFILE, $outputFilename);
open(DATAFILE, ">tempDataFile.txt"); # Open this for writing (it'll be a temporary copy of the data)
while ($line=<INPUTFILE>) # for each line in the inputfile
$line=~s/\r/\n/g; # replace the weird mac newline tag with a unix style \n
print DATAFILE "$line\n"; # print the line to the temporary data file
close(INPUTFILE); # close this
close(DATAFILE); # close this
open(DATAFILE, "<tempDataFile.txt"); # open this for reading
open(OUTPUTFILE, ">data_processed.txt"); # open this for writing
while ($line=<DATAFILE>) # for each line of the temp data file
# my ($line) = $_;
chomp $line; # cut off the newline char
$firstchar=substr($line, 0, 0); # consider the first character of the line
if ($firstchar =~ /\d/) # if the first character of the line is a digit...
$line =~ s/,/\t/g ; # convert all commas in it to tabs
print OUTPUTFILE "$line\n"; # print it to the output file
close(DATAFILE); # close this
close(OUTPUTFILE); # close this
Before I wrote this, of course I wrote a pseudocode to help me through the process.
Take text file as input (data.txt)
Open a blank text for output (data_formatted.txt)
If data_formatted.txt already exists, prompt to overwrite (y/n)
If 'n', prompt for new name.
Check if new name already exists, and if so prompt for overwrite, etc (recurse as necessary)
(replace mac newline characters with unix ones, if needed to do the next part)
If the current line starts with "SubjectID:"
Store the value of the next 'word' (e.g. 7) as the variable $subjectID.
For each line that begins with a number (0-9):
Read it in
Replace each comma with a tab
Append as the next line of data_formatted.txt:
$subjectID \t [current line of data] \n
Close data_formatted.txt
But this seems to not work.
Here's the snippet of the data file:
Trial Condition trial start ResponseLabel Time keys sequence mouse_down UBRelativeTS UBAbsoluteTS UBSystemTS UBDrift UBButtons UBVoice UBOptic UBIOPorts UBQueueLength TrialTime TrialLabel
4 269474 skunk,Animals,2.03342375548695,sk,1,5,1,1,1,15,1,0,0,Fam N/A [N/A] 0 [N/A] [N/A] [N/A] 46 00000000 0 X 0000000000000000 0 1691 RESPONSE
5 274746 chin,BodyParts,2.66838591669,J,1,3,17,1,2,16,2,0,0,Fam N/A [N/A] 0 [N/A] [N/A] [N/A] 47 00000000 0 X 0000000000000000 0 1088 RESPONSE
6 278615 bus,Transit,3.18949031369937,b,1,3,22,1,2.99,11,13,0,0,Fam N/A [N/A] 0 [N/A] [N/A] [N/A] 47 00000000 0 X 0000000000000000 0 638 RESPONSE
7 282031 cactus,Plants,1.93449845124357,k,2,6,0,1,3,2,4,0,0,Fam N/A [N/A] 0 [N/A] [N/A] [N/A] 48 00000000 0 X 0000000000000000 0 752 RESPONSE
8 285563 knife,Utensils,3.09898963940118,n,1,3,9,1,4,4,6,0,0,Fam N/A [N/A] 0 [N/A] [N/A] [N/A] 48 00000000 0 X 0000000000000000 0 572 RESPONSE
Thank you so much for your help.