Ed. Note: Blog post contributed by [Nick Carter], Maker, retired electrical engineer, who is active in the Robotics Meetup, DIYBio Meetup, Artificial Intelligence Meetup, and pitches in with STEM programs whenever needed.
I had wanted to play with voice recognition for a long time, ever since the group I was in at ITT Research Center, many years ago, did some telephony voice related applications – a dial by voice system and an automatic switchboard attendant (U.S. patent 4608460 – I may be partly to blame for the ”Press 1 for..” menus J). I have also held a long interest in robots and have an interest in making machines act somewhat human-like in responses, so when I saw the EasyVR VR-3 speech recognition shield for the Arduino I decided this was my chance.
The shield was around $50 (sadly, now out of stock?), and provides a reasonable functionality and a very nice graphical programming interface for the Arduino. The VR-3 needed some soldering assembly but it was not too tough, basically soldering headers and recognition board to shield board.
Initially to test it I made a “magic 8 ball” toy with only speaker dependent trained commands including some holiday season fortunes and I decided it would be a wizard looking into his crystal ball, hence Marvo was born. One time I showed it at Nova Labs, one small girl tried it and asked it if she was going to get a puppy for Christmas, and Marvo told her yes – and I think she believed him. I am probably in big trouble. With the voice operated electronics these days, it is going to be difficult getting kids who grow up with them to distinguish real responses from toy responses (and true from untrue) – especially the more they act humanlike.
Later I refined Marvo, for a maker fair in Haymarket giving him some LEDs and an actual ball (bouncy superball with embedded stars), backlit with LED. I also added a second set of commands using the “robot” speaker independent set. And added responses to try to get kids engaged and try the various commands. You can see a video here which also shows the robot training the user (me).
You can see the difficulty in recognition sometimes and how it has to train the user to speak properly for good response. It did not really like my English accent. I also found that it is very sensitive to external noise and in the Haymarket fair environment it really had a tough time although once I moved to a quieter spot and tweaked some recognition parameters it did a bit better.
For a practical application, this would work fine in a quiet home environment and for controlling things for people with disabilities who could still talk well.
To start, you create a speaker dependent trigger word that starts it into the program. I put “Hey, Marvo”. The speaker independent trigger is “Robot”. Each speaker dependent word/phrase has to be trained 2 times and can be tested for recognition accuracy within groups of words.
To make the speaker dependent recognition more robust you can add additional entries in the word lists what are the same phrase spoken by different voices. The program may flag them as duplicates if the recognition template is the same.
It has a built in speaker independent Robot oriented set of command groups and you can make your own groups of speaker dependent commands that you can train to your own or multiple voices. The key here is to only include in command groups words or phrases that are readily distinguishable. There is 1 trigger word to get its attention and then you can use the Arduino program to choose which group of words to listen for and what actions to take.
The additional “robot” command groups are movement directions, “up”, “down” etc. and the numbers zero through nine.
It will store up to 32 voice response messages that you can record but unfortunately not save off, once you have recorded them. The program itself and the recognition templates can be stored off for reuse.
You can make quite a sophisticated system with this; it has 16 word groups that you can train. There is also a tool that you can buy for ~ $200 that will convert the speaker dependent into speaker independent. If one was using this commercially it could be worthwhile but too much for me.
Once you have trained the commands you want, the EasyVR Commander program will generate an Arduino program template file with all the setup commands and the voice recognition menus you made set up as switch/case statements so that you can add the programmed actions to them. This saves a lot of work and figuring out. If you really want, the detail level commands are provided to talk to the board.
You can also have it store sounds from wav files. It comes with a “beep”, and there is also a feature for generating “lipsync” parameters from the recordings as they play, for animatronic mouth animation.
All in all, this is fairly easy to use and works well enough to be entertaining – well, I had fun with it.