Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Substitute Words for Words 1

Status
Not open for further replies.

Kirsle

Programmer
Jan 21, 2006
1,179
US
I'm programming this bot response scripting language called RiveScript, and one of its features is to substitute first- and second-person pronouns in the bot's reply.

For instance:
Code:
+ do you *
- Yes I {person}<star>{/person}

So when you would say "do you think i should talk to you?" it would reply "Yes I think you should talk to me", swapping the first- and second-person pronouns.

The code I currently have splits the text to be substituted into an array at the spaces, then substitutes each word one at a time, then joins it back together. This is to get past the possible error of:

Code:
$var =~ s/you/me/ig;
$var =~ s/me/you/ig;

That code would, of course, convert all "you's" to "me's" and then back again, cancelling itself out.

The problem is, it would be helpful if two-word pairs could be substituted around. But since the code currently splits it into words, this isn't possible.

For an example of why you'd want two-word pairs to be substituted is because, sometimes "you" should translate into "me" and sometimes into "I"

Code:
you are => i am
i am => you are

Here's the code I currently have for running these substitutions, to prevent it from contradicting itself:
Code:
# where $self->{person}->{you} = me
# etc.

sub person {
	my ($self,$msg) = @_;

	# Lowercase the string.
	$msg = lc($msg);

	# Get the words and run substitutions.
	my @words = split(/\s+/, $msg);
	my @new = ();
	foreach my $word (@words) {
		if (exists $self->{person}->{$word}) {
			$word = $self->{person}->{$word};
		}
		push (@new, $word);
	}

	# Reconstruct the message.
	$msg = join (' ',@new);

	return $msg;
}

So long story short, I'm looking for a way to substitute multiple-word items for other multiple-word items.
 
Following code:
Code:
$var = 'coffee, tea or me?';
$replace{'coffee'} = 'tea';
$replace{'tea'}   = 'me';
$replace{'me'}   = 'coffee';

print $var , "\n";
$var =~ s/(coffee|tea|me)/$replace{$1}/ig;
print $var;
will print:
Code:
coffee, tea or me?
tea, me or coffee?
;-)
 
simple substitution of words or phrases or pairs of words seems easy enough, but how accurate is the substitution supposed to be?
 
It's supposed to be accurate enough that the reply still makes sense.

Without worrying about substitutions:
Code:
if ($msg =~ /^do you (.*?)$/) {
   print "Yes I $1";
}

User: do you think I should talk to you
Perl: Yes I think I should talk to you

--it should look like this--

User: do you think I should talk to you
Perl: Yes I think you should talk to me

Sometimes "you" should be translated into "me" and sometimes "you" should become "I"

And it should swap these word pairs, replacing all "you are's" with "i am's" and all "i am's" with "you are's", but without (re)replacing ones it just got done replacing (which would contradict itself).

Anyway, I got an idea of how to approach this from eewah. I just have to look into sorting arrays by number of words and create a regexp-friendly scalar and stuff, to dynamically come up with the word lists to substitute.
 
Well, you're working on what is essentially rudimentary AI. How intelligent the AI is supposed to be will determine what algorithims you come up with to base the responses on. I might try and keep the responses more human like. Most people would respond yes or no or ask another question for clarification, like:

Talk about what?

But maybe you are working towards that.
 
One (quick and dirty) way round the self-contradiction problem would be:[ol][li]Don't split the input string, just remove any multiple whitespace to normalise it a bit.[/li][li]Iterate through your list of word pairs, finding and replacing them, but replacing "I am" with "You%are" (if you don't like the %, use some other escape character).[/li][li]Now split the string into an array on whitespace as before. Iterate through your list of single word replacements, replacing as you go. The % will stop it matching anything you've already fixed.[/li][li]Finally, replace the %s with spaces, and output.[/li][/ol]

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top