Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

perl: problem with pattern matching for comparision pair of string

Status
Not open for further replies.

diera

Programmer
Mar 21, 2011
28
DE
Hi all,
I have a problem to count the majority selectedresult for each pair of string (sysA,sysB). for each query, i have 3 different combination of string comparision.
* comparison("lucene-std-rel","lucene-noLen-rr");
* comparison("lucene-noLen-rr","lucene-std-rel");
* comparison("lucene-noLen-rr","random");

my code: it seems just count if user choose either sysA, sysB or both without considering the 3 different pair.

code:
( $file = <INFILE> ) {
@field = parse_csv($file);
chomp(@field);
@query = $field[1];

for($i=0;$i<@query;++$i) {
if ( ($field[2] eq $method) || ($field[3] eq $method)){
if ( $field[4] eq $field[2]) {
print "$query[$i]: $field[2], $field[3], $field[4]\n";
$counta++;
}
if ( $field[4] eq $field[3]) {
print "$query[$i]: $field[2], $field[3]: $field[4]\n";
$countb++;
}
if ( $field[4] eq ($field[2] && $field[3])) {
#print "$query[$i]: $field[2]$field[3]\n";
$countc++;
}

Data:5 field> user,query,sysA,sysB,selectedresult
* user1,male,lucene-std-rel,random,lucene-std-relrandom
o user2,male,lucene-std-rel,random,lucene-std-rel
o user3,male,lucene-std-rel,random,lucene-std-rel
o user4,male,lucene-std-rel,random,lucene-std-rel
o user5,male,lucene-std-rel,random,lucene-std-relrandom
o user6,male,lucene-std-rel,random,lucene-std-rel
o user7,male,lucene-std-rel,random,lucene-std-rel

example output required: query 1:male fitness models
lucene-std-rel:5, random:0, both:2 ---> majority:lucene-std-rel
any help is very much appreciated.
 
Note, it would really help if you ran your code through something like Perl::Tidy before posting it, and to enclose it in [&#91;]CODE] tags [&#91;]/CODE].

Here is the code for other experts if they want to take a crack at helping you.

Code:
while ( $file = <INFILE> ) {
    @field = parse_csv($file);
    chomp(@field);
    @query = $field[1];

    for ( $i = 0 ; $i < @query ; ++$i ) {
        if ( ( $field[2] eq $method ) || ( $field[3] eq $method ) ) {
            if ( $field[4] eq $field[2] ) {
                print "$query[$i]: $field[2], $field[3], $field[4]\n";
                $counta++;
            }
            if ( $field[4] eq $field[3] ) {
                print "$query[$i]: $field[2], $field[3]: $field[4]\n";
                $countb++;
            }
            if ( $field[4] eq ( $field[2] && $field[3] ) ) {
                #print "$query[$i]: $field[2]$field[3]\n";
                $countc++;
            }

I'll come back later and take a crack at it if someone hasn't already helped you.

- Miller
 
Do not understand what you want to do.
A couple of comments however:
The statement
[tt]@query = $field[1];[/tt]
creates an array with a single element: so, why the [tt]for[/tt]?
The expression
[tt]$field[2]&&$field[3][/tt]
returns the second operand if the first one is true (not null), otherwise false, that is an [tt]undef[/tt] that, when compared to [tt]$field[4][/tt] will give false unless [tt]$field[4][/tt] is also [tt]undef[/tt]: is this mess what you really want?

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
@prex. i have data in csv file. user,query,sysA,sysB,selectedresult.
actually i want to make loop for query which is in field[1].

$field[2]&&$field[3] :
for each selectedresult that choose both as example lucene-std-rel (sysA),random (sysB)---lucene-std-relrandom(sysA&sysB)

if ( ( $field[2] eq $method ) || ( $field[3] eq $method ) ) {
condition for extract data that only compare with my method(lucene-nolen-rr) because in my data it mixed with other pair of string.

thanks.
 
@miller. thanks for you suggestion. herewith my new code.
Code:
 #!/usr/bin/perl
#use strict;  
# @(#) p3

#$|++;    # do not buffer output

my $file            = "";
my @field           = ();
my $line            = 1;
my ($counta,$countb,$countc)= 0;

my ($method1) = "lucene-noLen-rr";
my ($method2)= "lucene-nolen-rel";
my ($method3)= "lucene-std-rel";
my ($method4)= "random";

open( INFILE, "compare.csv" )
  or die("Can not open input file: $!");

open MYFILE, ">output.txt";
select MYFILE;

 while ( $file = <INFILE> ) {
    @field = parse_csv($file);
    chomp(@field);
	@query = $field[1];
	
	for($i=0;$i<@query;++$i) {
		if ( ($field[2] eq $method1) || ($field[3] eq $method1)){
    	if ( $field[4] eq $field[2]) {
		print "$query[$i]: $field[2], $field[3], $field[4]\n";
		$counta++;
		} 
		if ( $field[4] eq $field[3]) {
		print "$query[$i]: $field[2], $field[3]: $field[4]\n";
		$countb++;
		}
		if ( $field[4] eq ($field[2] && $field[3])) {
		$countc++;
		#print "$query[$i]: $field[2]$field[3]\n";	
	}
	comparison("lucene-std-rel","lucene-noLen-rr");
	comparison("lucene-noLen-rr","lucene-std-rel");
	comparison("lucene-noLen-rr","random");
	comparison("random","lucene-noLen-rr");
	comparison("lucene-noLen-rr","lucene-noLen-rel");
	Comparison("lucene-noLen-rel","lucene-noLen-rr");

	sub comparison 
	{
        my($query,$method1, $method2, $method3, $method4) = $_;

        if ( $query eq $method1) 
        	{
               # print " $field[2] :$counta \t ";
       		} 
        if($query eq $method2)
        	{
              #  print " $field[3]: $countb \t ";
        	} 
        if ($query eq ($method2 && $method3))
        	{
        	#print "Both: $countc \n\n\n ";
        	}
        else 
        	{
        	#print "$method4";
		} 
	
	} 
	}
}
}
close(INFILE);

exit;

sub parse_csv {
    my $text = shift;
    my @new  = ();
    push( @new, $+ ) while $text =~ m{
       "([^\"\\]*(?:\\.[^\"\\]*)*)",?
           |  ([^,]+),?
           | ,
       }gx;
    push( @new, undef ) if substr( $text, -1, 1 ) eq ',';
    return @new;
}

result: its unable to extract one pair & the count is for the entire document instead of individual pair of string (or query) Refer below;

[Program output results]
male fitness models: lucene-noLen-rel, lucene-noLen-rr: lucene-noLen-rr
lucene-noLen-rel :2 lucene-noLen-rr: 4 Both: 4
male fitness models: lucene-std-rel, lucene-noLen-rr, lucene-std-rel
lucene-std-rel :6 lucene-noLen-rr: 5 Both: 5

[Actual output - manual calculation]
male fitness models,lucene-noLen-rel,2lucene-noLen-rr,4,both,,1
male fitness models,lucene-std-rel,4,lucene-noLen-rr,1,both,2
male fitness models,lucene-noLen-rr,5,random,1,both,1

Thanks again for the assist.
 
Once again running the code through Perl::Tidy to make it more readable

Code:
#!/usr/bin/perl
#use strict;
# @(#) p3

#$|++;    # do not buffer output

my $method1 = "lucene-noLen-rr";
my $method2 = "lucene-nolen-rel";
my $method3 = "lucene-std-rel";
my $method4 = "random";

open(INFILE, "compare.csv")
  or die("Can not open input file: $!");

open MYFILE, ">output.txt";
select MYFILE;

my ($counta, $countb, $countc) = 0;

while (my $file = <INFILE>) {
	my @field = parse_csv($file);
	chomp(@field);
	my @query = $field[1];

	for (my $i = 0 ; $i < @query ; ++$i) {
		if (($field[2] eq $method1) || ($field[3] eq $method1)) {
			if ($field[4] eq $field[2]) {
				print "$query[$i]: $field[2], $field[3], $field[4]\n";
				$counta++;
			}
			if ($field[4] eq $field[3]) {
				print "$query[$i]: $field[2], $field[3]: $field[4]\n";
				$countb++;
			}
			if ($field[4] eq ($field[2] && $field[3])) {
				$countc++;

				#print "$query[$i]: $field[2]$field[3]\n";
			}
			#comparison("lucene-std-rel",  "lucene-noLen-rr");
			#comparison("lucene-noLen-rr", "lucene-std-rel");
			#comparison("lucene-noLen-rr", "random");
			#comparison("random",          "lucene-noLen-rr");
			#comparison("lucene-noLen-rr", "lucene-noLen-rel");
			#Comparison("lucene-noLen-rel", "lucene-noLen-rr");
		}
	}
}
close(INFILE);

exit;

sub comparison {
	my ($query, $method1, $method2, $method3, $method4) = $_;

	if ($query eq $method1) {
		# print " $field[2] :$counta \t ";
	}
	if ($query eq $method2) {
		#  print " $field[3]: $countb \t ";
	}
	if ($query eq ($method2 && $method3)) {
		#print "Both: $countc \n\n\n ";
	}
	else {
		#print "$method4";
	}
}


sub parse_csv {
	my $text = shift;
	my @new  = ();
	push(@new, $+) while $text =~ m{
       "([^\"\\]*(?:\\.[^\"\\]*)*)",?
           |  ([^,]+),?
           | ,
       }gx;
	push(@new, undef) if substr($text, -1, 1) eq ',';
	return @new;
}

And now adding ### comments to point out problems.

Code:
#!/usr/bin/perl
### This line makes me sad.  Always use strict.  Fortunately your code mostly follows it.
#use strict;
# @(#) p3

#$|++;    # do not buffer output

my $method1 = "lucene-noLen-rr";
my $method2 = "lucene-nolen-rel";
my $method3 = "lucene-std-rel";
my $method4 = "random";

open(INFILE, "compare.csv")
  or die("Can not open input file: $!");

open MYFILE, ">output.txt";
select MYFILE;

my ($counta, $countb, $countc) = 0;

while (my $file = <INFILE>) {
	### Instead of rolling your CSV parser, I suggest you look at Text::CSV
	### Also, what are you chomping here?  Do you know what chomp does?  I 
	### suspect what you really want is:
	###    chomp($file);
	###    my @field = parse_csv($file);
	my @field = parse_csv($file);
	chomp(@field);
	
	### Why are you creating a single element array and then looping on that
	### single element?  This looks like a bug.
	my @query = $field[1];

	### $i will only be 0, since @query just contains a single element.
	for (my $i = 0 ; $i < @query ; ++$i) {
		if (($field[2] eq $method1) || ($field[3] eq $method1)) {
			if ($field[4] eq $field[2]) {
				print "$query[$i]: $field[2], $field[3], $field[4]\n";
				$counta++;
			}
			if ($field[4] eq $field[3]) {
				print "$query[$i]: $field[2], $field[3]: $field[4]\n";
				$countb++;
			}
			### This looks like a bug.  Are you trying to test that all
			### the fields are equal?
			if ($field[4] eq ($field[2] && $field[3])) {
				$countc++;

				#print "$query[$i]: $field[2]$field[3]\n";
			}
			### Your comparison code did not actaully do anything so I
			### commented it out.  However, note that in here you are
			### trying to pass 2 elements to teh sub each time, but you
			### don't collect them properly from @_ in comparison.
			#comparison("lucene-std-rel",  "lucene-noLen-rr");
			#comparison("lucene-noLen-rr", "lucene-std-rel");
			#comparison("lucene-noLen-rr", "random");
			#comparison("random",          "lucene-noLen-rr");
			#comparison("lucene-noLen-rr", "lucene-noLen-rel");
			#Comparison("lucene-noLen-rel", "lucene-noLen-rr");
		}
	}
}
close(INFILE);

exit;

sub comparison {
	### You use $_ here instead of @_.  Also, your constants $method1-4
	### will be hidden by this call since you're my'ing them.  I assume
	### that you want them to be the values already set at teh beginning
	### of the file.
	### Maybe you want this?
	###    my ($query, $query2) = @_;
	### Obviously I don't know what query2 is, but you pass 2 parameters
	### to this function every time.
	my ($query, $method1, $method2, $method3, $method4) = $_;

	if ($query eq $method1) {
		# print " $field[2] :$counta \t ";
	}
	if ($query eq $method2) {
		#  print " $field[3]: $countb \t ";
	}
	if ($query eq ($method2 && $method3)) {
		#print "Both: $countc \n\n\n ";
	}
	else {
		#print "$method4";
	}
}


sub parse_csv {
	my $text = shift;
	my @new  = ();
	push(@new, $+) while $text =~ m{
       "([^\"\\]*(?:\\.[^\"\\]*)*)",?
           |  ([^,]+),?
           | ,
       }gx;
	push(@new, undef) if substr($text, -1, 1) eq ',';
	return @new;
}

That's all I can help you right now. You need to spend some time cleaning up your code and figuring out what your logic is really trying to do. Also, it would help to have some sample data from compare.csv to run it on if/when you need more help.

- Miller
 
thanks miller. i will rework on my code again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top