Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Voice command controlling VFP buttons

Status
Not open for further replies.

Jan Flikweert

Programmer
Mar 20, 2022
85
NL
Hi all,

I managed to get Speech to Command working in VFP. For that part you can see this is a tip. I faced an issue regarding speaking single digits. The solution was using a prefix: f.e. "Box one", "Box two" etc.

Using single digit's the program selects command buttons with f.e. "1", lablels them and ask your choice. That works, but is not fast.

Kind regards,

Jan Flikweert

Code:
CLOSE ALL
CLEAR ALL
Public oRecognize, oVFPObj,ogrammar,result_command
oVFPObj = Createobject("RCLSpeechInput")
oRecognize = Createobject("SAPI.SpSharedRecoContext")
*oRecognize = Createobject("SAPI.SpInProcRecognizer")
Eventhandler(oRecognize,oVFPObj)
ogrammar = oRecognize.CreateGrammar(1)
oGrammar.DictationLoad
oGrammar.DictationSetState(1)

READ EVENTS

Define Class RCLSpeechInput As Session OlePublic
	Implements _ISpeechRecoContextEvents In "C:\WINDOWS\SYSTEM32\SPEECH\COMMON\SAPI.DLL"

	Procedure _ISpeechRecoContextEvents_StartStream(StreamNumber As Number, StreamPosition As VARIANT) As VOID;
		HELPSTRING "StartStream"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_EndStream(StreamNumber As Number, StreamPosition As VARIANT, StreamReleased As LOGICAL) As VOID;
		HELPSTRING "EndStream"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_Bookmark(StreamNumber As Number, StreamPosition As VARIANT, BookmarkId As VARIANT, Options As VARIANT) As VOID;
		HELPSTRING "Bookmark"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_SoundStart(StreamNumber As Number, StreamPosition As VARIANT) As VOID;
		HELPSTRING "SoundStart"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_SoundEnd(StreamNumber As Number, StreamPosition As VARIANT) As VOID;
		HELPSTRING "SoundEnd"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_PhraseStart(StreamNumber As Number, StreamPosition As VARIANT) As VOID;
		HELPSTRING "PhraseStart"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_Recognition(StreamNumber As Number, StreamPosition As VARIANT, RecognitionType As VARIANT, Result As VARIANT) As VOID;
		HELPSTRING "Recognition"
	_VFP.AutoYield=.f.
	result_command=Result.PhraseInfo.GetText
	Do Case
	Case LOWER(result_command) == "one"
		Thisformset.Baseform12.Btnctrlmemory1.exe_mmr()
	Case LOWER(result_command) = "two"
		Thisformset.Baseform12.Btnctrlmemory2.exe_mmr()
	Case "three" $ LOWER(result_command)
		Thisformset.Baseform12.Btnctrlmemory3.exe_mmr()
	Case LOWER(result_command) == "four"
		Thisformset.Baseform12.Btnctrlmemory4.exe_mmr()
	Case LOWER(result_command) == "five"
		Thisformset.Baseform12.Btnctrlmemory5.exe_mmr()
	Case LOWER(result_command) == "six"
		Thisformset.Baseform12.Btnctrlmemory6.exe_mmr()
	Case LOWER(result_command) == "seven" 
		Thisformset.Baseform12.Btnctrlmemory7.exe_mmr()
	Case LOWER(result_command) == "eight"
		Thisformset.Baseform12.Btnctrlmemory8.exe_mmr()
	Case LOWER(result_command) == "nine"
		Thisformset.Baseform12.Btnctrlmemory9.exe_mmr()
	Case LOWER(result_command) == "ten"
		Thisformset.Baseform12.Btnctrlmemory10.exe_mmr()
	Case result_command == "eleven"
		Thisformset.Baseform12.Btnctrlmemory11.exe_mmr()
	Case result_command == "twelve"
		Thisformset.Baseform12.Btnctrlmemory12.exe_mmr()
	Case result_command == "thirtheen"
		Thisformset.Baseform12.Btnctrlmemory13.exe_mmr()
	Case result_command == "fourtheen"
		Thisformset.Baseform12.Btnctrlmemory14.exe_mmr()
	Case result_command == "15"
		Thisformset.Baseform12.Btnctrlmemory15.exe_mmr()
	Case result_command == "16"
		Thisformset.Baseform12.Btnctrlmemory16.exe_mmr()
	Case result_command == "17"
		Thisformset.Baseform12.Btnctrlmemory17.exe_mmr()
	Case result_command == "18"
		Thisformset.Baseform12.Btnctrlmemory18.exe_mmr()
	Case result_command == "19"
		Thisformset.Baseform12.Btnctrlmemory19.exe_mmr()
	Case result_command == "20"
		Thisformset.Baseform12.Btnctrlmemory20.exe_mmr()
	Case result_command == "21"
		Thisformset.Baseform12.Btnctrlmemory21.exe_mmr()
	Case result_command == "22"
		Thisformset.Baseform12.Btnctrlmemory22.exe_mmr()
	Case result_command == "23"
		Thisformset.Baseform12.Btnctrlmemory23.exe_mmr()
	Case result_command == "24"
		Thisformset.Baseform12.Btnctrlmemory24.exe_mmr()
	Case result_command == "25"
		Thisformset.Baseform12.Btnctrlmemory25.exe_mmr()
	Case result_command == "26"
		Thisformset.Baseform12.Btnctrlmemory26.exe_mmr()
	Case result_command == "27"
		Thisformset.Baseform12.Btnctrlmemory27.exe_mmr()
	Case result_command == "28"
		Thisformset.Baseform12.Btnctrlmemory28.exe_mmr()
	Case result_command == "29"
		Thisformset.Baseform12.Btnctrlmemory29.exe_mmr()
	Case result_command == "30"
		Thisformset.Baseform12.Btnctrlmemory30.exe_mmr()
	Endcase
	_VFP.AutoYield=.t.
	Endproc

	Procedure _ISpeechRecoContextEvents_Hypothesis(StreamNumber As Number, StreamPosition As VARIANT, Result As VARIANT) As VOID;
		HELPSTRING "Hypothesis"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_PropertyNumberChange(StreamNumber As Number, StreamPosition As VARIANT, PropertyName As String, NewNumberValue As Number) As VOID;
		HELPSTRING "PropertyNumberChange"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_PropertyStringChange(StreamNumber As Number, StreamPosition As VARIANT, PropertyName As String, NewStringValue As String) As VOID;
		HELPSTRING "PropertyStringChange"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_FalseRecognition(StreamNumber As Number, StreamPosition As VARIANT, Result As VARIANT) As VOID;
		HELPSTRING "FalseRecognition"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_Interference(StreamNumber As Number, StreamPosition As VARIANT, Interference As VARIANT) As VOID;
		HELPSTRING "Interference"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_RequestUI(StreamNumber As Number, StreamPosition As VARIANT, UIType As String) As VOID;
		HELPSTRING "RequestUI"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_RecognizerStateChange(StreamNumber As Number, StreamPosition As VARIANT, NewState As VARIANT) As VOID;
		HELPSTRING "RecognizerStateChange"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_Adaptation(StreamNumber As Number, StreamPosition As VARIANT) As VOID;
		HELPSTRING "Adaptation"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_RecognitionForOtherContext(StreamNumber As Number, StreamPosition As VARIANT) As VOID;
		HELPSTRING "RecognitionForOtherContext"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_AudioLevel(StreamNumber As Number, StreamPosition As VARIANT, AudioLevel As Number) As VOID;
		HELPSTRING "AudioLevel"
* add user code here
	Endproc

	Procedure _ISpeechRecoContextEvents_EnginePrivate(StreamNumber As Number, StreamPosition As VARIANT, EngineData As VARIANT) As VOID;
		HELPSTRING "EnginePrivate"
* add user code here
	Endproc

Enddefine

Untitled_oyxcmy.png
 
You get error message "OBJECT is not contained in a formset", right?

The solution in short is to change this definition:
Code:
Define Class RCLSpeechInput As Container
   IMPLEMENTS ....
instead of "As Session".
And then to not create it without being part of the formset with CREATEOBJECT(), but with AddObject() of the formset or of the formset.Baseform12 and then adapt the usage code. So for example make this a child object of baseform12 in its init code:
Code:
This.Addobject("oRCLSpeechInput","RCLSpeechInput")
This.Addproperty("oRecognize", Createobject("SAPI.SpSharedRecoContext"))
Eventhandler(This.oRecognize,This.oRCLSpeechInput)
This.Addproperty("oGrammar", This.oRecognize.CreateGrammar(1))
This.oGrammar.DictationLoad()
This.oGrammar.DictationSetState(1)

Just to explain this: Code is only part of THIS, THISFORM or THISFORMSET, i.e. the rule applies to all of these keywords, when it is part of that or some childobject. And no matter if you do CREATEOBJECT() within a method or event of the formset or outside of it, that object can only refer to itself with THIS, but it isn't a child of THISFORM or THISFORMSET, also not, when you set a property of the formset or form or class to that object, you have to use the Addobject() method - or in fact the Newobject method, but yet again not just the Newobject function.

Things can only refer to a root object no matter how deep they are in the object hierarchy, when they are part of it. That's essential.

The major problem here merely is the choice of baseclass. Because a session can't be added anywhere. I know it's now your choice, you get this class definition generated by dragging the ISpeechRecoContextEvents interface from the SAPI.dll from the object browser into an empty PRG editor window. But you can't make a session a child object of anything.

There are rules about which classes can be containers to other classes. For example the only classes able to be child of a formset are forms. Neither custom nor container nor anything else can be a child of a formset. Forms are less restrictive, they can host any control. The only thing a form can't contain is a form, that's the job of a formset, if at all.


Anything at all you don't create with someobject.Addobject but with CREATEOBJECT() or with NEWOBJECT() function has no parent and can't even be set to be the child of some parent anymore, it becomes a root object.

So that has nothing to do with the hurdles you have from the IMPLEMENTS clause, nor does it have to do with the class being OLEPUBILC. It has to do with being part of the "family" that can call THISFORMSET in your case.

As you want to address baseform12 of your formset the code above suggest to make what you called oVFPObject a child object of Baseform12 called "oRCLSpeechinoput" instead of oVFPObject. Other objects necessary are COM based and not ActiveX controls, so they can't be child objects. They also don't need to be child object, but it's still the best to keep them in the same scope and let them be Baseform12 properties you add at runtime with This.Addproperty(), as I suggest. Less important, but notice Addproperty creates properties that become part of the object, but if their value is set to an object, that object doesn't become child to the parent object, it's just stored inside a property of the parent object, but doesn't become part of the hierarchy. What's useful anyway is that added properties have the lifetime scope of the object they are added to. No need for public vars.

Notice what I wrote as header title of the code section: CODE --> Baseform12.Init. this code is meant to be in the init event of baseform12, not just anywhere as a separate PRG. Also, now all of your code needs to adapt the new situation, since the class is instantiated on the level of baseform12, THIS in context of any class code will mean the form, you can now also use THISFORM, which will still mean baseform12 or THISFORMSET, which will mean the formset, but you should therefore change the references from THISFORMSET.Baseform122 to THIS.

I wonder whether or why your buttons have a method exe_mmr(), but taking this method name for granted your call changes to This.Btnctrlmemory1.exe_mmr(). You won't have intellisense support unless you would define the container class as visual class and add it to the form in the designer instead of using Addobject() at runtime.

There are ways of making LOCAL declarations AS class OF file to get intellisense, but I spare us that kind of fiddling with these tricks.

To summarize, your misunderstanding is mainly about how to address objects and limitations of containerships of objects and how keywords THIS, THISFORM, or THISFORMSET work. in short THIS should tell you how you're using it too lightly, THIS can only be literally THIS, if code referring to THIS is part of THIS, and that's even more true for THISFORM or THISFORMSET.

THIS is a keyword almost always usable, unless you're outside of code of a class. So THIS is not available in functions or procedures that are not part of class definitions but mere procedural programming functions and procedures or "naked" PRGs. THISFORM is available to most control classes only because they are added to forms, usually, but if you DEFINE CLASS mycontainer AS container and use THISFORM in its code somewhere and later instantiate the container with CREATEOBJECT(), it is a control without a form it belongs to and THISFORM won't work. Likewise THISFORMSET wouldn't work even if that container would be part of a form, but the form isn't part of a formset. So be aware those keywords only work in the right context.

So in summary of the summary, pay attention to scopes and when and where they are valid. That's not only topic of object scopes but also variable scopes and record scopes. You can only address things in scope, always, anything.

Chriss
 
If that wasn't your problem, then I don't know what else. I asked
myself said:
You get error message "OBJECT is not contained in a formset", right?

I asked so because that is the error I get when a number I say is recognized.

You also said:
Jan Flikweert said:
Using single digit's the program selects command buttons with i.e. "1", highlights them with neon color half transparent labels with numbers and asks your choice.

I didn't see that, but maybe because I don't have your form. I see there are command words that won't arrive as recognized text. If you say "close" for example, that won't appear when you ? Result.PhraseInfo.GetText, instead any close button of any open window or tab of a browser gets highlighted and numbered, then you choose one by number and "OK".

I see how that would be too slow and annoying to use when you want to switch, say, the soundfont, while playing. But that problem is merely a choice of what words to turn into actions. You can't use command words that are already reserved for Windows speech recognition actions. Numbers themselves are part of the highlight and choice, but standalone numbers are merely text recognized, so they are indeed free for you to use as commands. It may turn out that saying a number in the choice step of another command at makes that choice but at the same time activates your own speech recognition. Well, then pick other words. How about using any words you notice are well recognized and label your buttons with them. Like Names of sesame street characters or anything like that.

I didn't use speech recognition so far, but I can see how it helps while you play two handed and only have speech as command interface. I read something somewhere that slipped away, that you're able to teach speed recognition new command words. I think it's not a necessary thing to do as long as you actually avoid using already known command words and find something that is simply recognized text and arrives as Result.PhraseInfo.GetText in the _ISpeechRecoContextEvents_Recognition event.

I think you will need to figure how to turn off listening when focus moves to another application, so your commands don't become global commands, only commands that activate actions while your app is active. I haven't looked into that, but I guess when .DictationSetState(1) activates listening, some other state number is for stop listening. Surely less important for use "in concert" when there is usually nothing else you do.

So it should help to learn more about the events and their meaning and also methods available in context of that interface and related classes, like the Grammar object to get at what you need.

Chriss
 
Chriss,

In the mean time things did work correct. Now it is practice pronouncation/articulation. I solved the digit problem with using X as prefix for the digit. SAPI recognises this as EAX+the digit. A digit like "Two" can be explained as to or too. In that case I also defined to, or too.

Indeed defining a class using thisformset in a program file causes an error. I replaced that with the name of the form. The other part I placed in the [load] method of the form.

The method exe_mmr was indeed for presets not in use and using the click() method will do.

Kind regards,

Jan Flikweert
 
Jan Flikweert said:
Now it is practice pronouncation/articulation.
I wish you great success. I still wonder why this doesn't work for you without a prefix at all. It does for me.

And then, looking for any words or names that are unproblematic, recognized without many attempts - I'd use these words for the buttons. Then there's less training necessary. Short words without similar sounding sibling words should be perfect to use, shouldn't they? The idea to accept "two" or "to" or "too" for the same case of course also is doable. I wonder about the speed of recognition. I think speech recognition knows before the event happens, it just waits for a pause to see the word isn't continued. Like Two vs Twofold vs Tomorrow.

I see there is another event called _ISpeechRecoContextEvents_Hypothesis that should be called before _ISpeechRecoContextEvents_Recognition with the hypothesis of what the word could be. And if you pick words that are unlikely to cause same hypothesis, this should work faster.

I tried that, and you can use it, there's no big time benefit, but it can come first. It can, there's not always a hypothesis at all. I would have thought hypothesis already would happen directly after the word was spoken and before the listening phase ends after detecting a longer pause. When hypothesis happen, they come short before the recognition, so that doesn't accelerate things much.

Testing with whole phrases and sentences reveals that the hypothesis events give you the currently detected text whil speaking, that's where you could get hands on the start of a text before speech recognition considers it as complete. So hypothesis would work well with two or more word commands.

For example if you say "Box four", then hypothesis will come up with "Box" before recognition finally tells "Box four" is recognized. So hypothesis could work for you to detect a command prefix word, if you'd want that.

I'd still go with words that are recognized fast and are so unique that they can't be confused with other words to control your app. That means least training time necessary and faster response. It might be funny to need to use them, but caption your buttons with them and you know what to say, too.

Chriss
 
Chriss,

I added two links to a example from the Microsoft Speech SDK. An .exe and the sourcecode.
VB Example SAPI SDK
Reco.exe
I think the reason it works not with one digit for me much captions of stops contain digits.

The popup of Speech recognition contains option to modify the dictionary. I think hypothesis is by default called by Speech Recogniction.

Kind regards,

Jan Flikweert
 
 https://files.engineering.com/getfile.aspx?folder=9b2c124c-c308-4bcf-b371-641f02a83321&file=Reco.exe
Jan Flikweert said:
I think hypothesis is by default called by Speech Recogniction
No, First of all events are never called, they happen. They are triggered by interrupts, for example.

In the case of interface implementations (the IMPLEMENTS clause of DEFINE class) you handle events that are called by binding a COM class interface of events to a VFP class by EVENTHANDLER().

All the _ISpeechRecoContextEvents_xxx procedures happen, they are not used or called by the COM object, they happen in the COM object and when they happen that's forwarded to the VFP object. And in any of the procedures you can add code which reacts to these events.

I put ? commands in all of them and can see starting to speak causes a soundstart event. And then a phrasestart event. The interface name is ISpeechRecoContext[highlight #FCE94F]Events[/highlight], that speaks for itself (pun intended).

You can make use of this by only waiting for _ISpeechRecoContextEvents_Recognition events, as that's always the final decision of the speech recognition what it heard. But you're giving away a lot of options. You'll need documentation to know when events happen, what they are meant to deal with and what parameters but also what class properties are available to you to use.

That's not only true for this interface, that's the general case for all classes implementing an interface of a COM class by the implements clause.

Nothing of this is private to speech recognition, all of this is the public interface that you are recommended to make use of. You're not forced to use all events, but they are made available to you, anything that is private, hidden, protected would simply not be available as an interface event you can bind to.

Chriss
 
Chriss,

You are right. The benefit is we can put code in _ISpeechRecoContextEvents_xxx procedures. And indeed there are a lot of options possible. The only question is: Do I need them? When i speak "x one" and preset one is clicked, everything is OK. I should not know which improvement is needed.

Running [highlight #AD7FA8]Reco.exe[/highlight] which I provided you can emulate the issue I faced. Starting recognition and speaking the word [clear] will give you a choice between the two buttons containing that word clear.

Kind regards,

Jan Flikweert
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top