Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

extract sentence 1

Status
Not open for further replies.

sun9

Programmer
Dec 13, 2006
31
US
I am trying to extract data that have been separated into columns in my csv using Tie::CSV_File, I am able to extract from all columns except one which is actually a sentence so when I try to get $date[4][4] I get just the first word from the sentence, how can I extract the whole sentence?
Code:
tie my @data, 'Tie::CSV_File', $file ,WHITESPACE_SEPARATED;
 
is the file csv or white space delimited?


- Kevin, perl coder unexceptional!
 
It is a csv file with every value for one field starting at a particular column position
 
if it's a csv why are you using the optional WHITESPACE_SEPARATED delimiter to parse the lines?

# or to read a tabular, or a whitespace or a (semi-)colon separated file
tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPARATED;
# or use instead COLON_SEPARATED, SEMICOLON_SEPARATED, PIPE_SEPARATED,
# or even WHITESPACE_SEPARATED


you should just be doing this I would think:

Code:
tie my @data, 'Tie::CSV_File', $file;

- Kevin, perl coder unexceptional!
 
If I dont use the WHITESPACE_SEPARATED I get the following error when I try to access any row,column

Use of uninitialized value in print at C:\StudyPerl\Projects\test.pl line 17, <$fh> line 4.
 
post some sample lines from the file you arr woring with.

- Kevin, perl coder unexceptional!
 
This is the sample content from the csv file i am using- It has two rows and two fields the content of each of the fields start at the same column for both the rows, although I am unable to get the last word in the 2nd field to stay in the second field while posting so kindly bear with me..hope this helps in understanding the kind of file i am using

Code:
"TEST1"  "Why doesnt this code work?"                                                      
"TEST2"  "What am I doing wrong here?"

 
that is not a CSV (comma seperated values) file. At least what you posted isn't. If the last word breaks to a newline the Tie::CSV_file will not parse it correctly no matter what delimiter you try and use. Are you sure the file you are working with is exactly like the two lines you posted?

- Kevin, perl coder unexceptional!
 
The last word does not breakto a new line in the file, when I am posting it in this forum for some reason the last line breaks into a newline, the content of the first row second field is "Why doesnt this code work?" and that of the second row second field is "What is wrong?"
 
if that is how the file is don't even bother with the module:

Code:
open(FH,'yourfile.txt') or die "$!";
while(my $line = <FH>){
   my ($sentence) = $line =~ /^"[^"]*"\s*"([^"]+)"/;
   print "$sentence\n";
}
close(FH);

assumes there are no embedded " in the second column.



- Kevin, perl coder unexceptional!
 
That is a really strange file format. There appear to be multiple ways that you could approach parsing each record based off of what you told us. Given that you said that each of the columns at a fixed position, I would be strongly tempted to use substr to extract each record element. This would prevent any uncertainty concerning whether the Quotes were always there, or if there were ever embedded quotes and what the escaping method would be for those if they existed.

Nevertheless, the quotes are just too tempting a method for parsing, so I'll just simplify Kevin's code a little bit in my proposed solution given the available data:

Code:
open(FH, 'yourfile.txt') or die $!;
while (my $line = <FH>) {
   my @data = $line =~ /"(.*?)"/g;
   my $sentence = $data[1];
   print "$sentence\n";
}
close(FH);

Short and sweet.
 
Thanks KevinADV and MillerH for your input. But the reason I was using the module was because it easily allowed me to easily access the row,column value by just specifying the row and column number I required (of course until i came across the sentence issue) since I have to place these values into another file and moreover my csv file does have embedded quotes :( . I tried using substr as such

Code:
open (FILE, 'test.txt') or die $!;
while (<FILE>) { 
    chomp; 
    $firstfield = substr $_, 1, 11;  #this is where my first field starts and ends
    print "$firstfield" ;  
 } 
close (FILE);

But this too just gives me all the first field values, I am thinking of looping through these values and extracting the first column first row values is there a better option for this?

Thanks.
 
What separates the two fields, tab or spaces? If it's a tab, it would make things a lot easier.

Also, if you have embedded quotes, unless they are escaped in some way (by doubling them, for example) it could be tricky to parse.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
The fields are separated by spaces not tabs like this

Code:
"TEST1"    "Sample sentence "here"?"    
"TEST2"    "Sample sentence2 "here"?"
 
Thanks MillerH that was a useful link on substr. I am able to pull out the entire second field now along with the embedded quotes using substr but what should I use to extract just the first element (or any particular row value) from this list? e.g:Sample sentence "here"?


Thanks.
 
You coud use Tie::File instead of Tie::CSV_file:


works like the other Tie module but doesn't do the parsing. But you just need to add your substr code

Code:
use Tie::File;
tie my @data, 'Tie::File', $file or die "$!";
my $row20 = substr,$data[20].........;



- Kevin, perl coder unexceptional!
 
Thanks so much KevinADC that module along with the substr really helped !!
 
hi,

I was able to extract the data from the file just like I needed and everything was working fine, but now suddenly for some reason I cannot extract the data using $data[3] or $data[$i] and I am not sure why and my print($mydata[3]) displays nothing...is this because of my open statement?

use Tie::File ;
my $file = 'test.csv' ;
open(FILE, $file)|| die("$file\n$!\n") ;

my @mydata, 'Tie::File', $file ;
print ($mydata[3]);

Thanks.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top