Voice recognition technology is nothing new. I remember almost twenty years ago attempting to sell copies of Dragon Naturally Speaking for Windows 95 (it was expensive, although not nearly as much as early versions which went for $9000!), and imagining a future without having to type. Designed for the user to dictate directly into a word processor, the time savings could have been substantial. Unfortunately, the hardware available to the average consumer didn’t have enough power to run the programme properly, so it never really took off, and eventually Dragon went under.
Around the same time, the BBC documentary Tomorrow’s World suggested that speech recognition would enable us to control anything, from turning on the lights and increasing the temperature of a room to making phone calls, all without touching a button. We wanted to believe the fantasy, but the reality felt so far away.
Things didn’t improve for quite some time. Try asking a voice control sat-nav from the early naughties to take you to Dorset; after screaming at it for twenty minutes it would end up taking you to Watford Gap. Good luck if you have a thick accent…
Fourteen years later, and with many companies failing to make the dream a reality, Apple released Siri. Powered by technology by a company called Nuance (the latest owners, incidentally, of Dragon Naturally Speaking), Siri felt revolutionary, and changed the perception of voice recognition from being a gimmick, that more often than not failed, to a device that returned answers to questions that were more often than not correct – imagine that!
Siri is now six years old. It has improved immensely, but to some is still seen as something tried a few times to impress the kids (“Siri, beatbox to me” is still my favourite) and then go back to tapping the screen as usual. Handset manufacturers entice customers to upgrade repeatedly, by including bigger and better screens as the main focal point of the device. Mobile phones are designed to be picked up, touched and interacted with using hands and eyes, rather than spoken to from a distance, and it is for this reason that, ironically, it seems unlikely that voice will become the primary interface for phones.
September 2016 was a historic month for voice technology, with the UK release of the game changing Amazon Echo. Here was a device solely designed to be controlled by your voice with no screen at all: a brave move by Amazon but justified by totally nailing the user experience.
Powered by a Star Trek-inspired, voice-controlled “intelligent personal assistant”, Alexa, here was a device that not only recognised your voice commands – it actually did useful things with them.
“Alexa, play me some Rolling Stones.” – Cue a playlist of music from The Rolling Stones. “Alexa, what time is it in Nashville?” – Alexa responds with the time in the country music capital of the world. “Alexa, what’s the weather forecast for today?” – The answer will almost certainly be raining, if you live in the UK.
All of a sudden, voice control went from something to impress your friends with to being genuinely useful.
Then the real masterstroke was revealed: Amazon made the Alexa Voice Service (AVS) available to any CE manufacturer wishing to build voice operated devices … for free.
The impact of this move on the consumer electronics market was immense, as anyone who visited CES (Consumer Electronic Show) in Las Vegas last January could attest to. Anything from speakers to TVs, interactive fridges to in-car entertainment, and even an Alexa powered dancing robot (https://www.engadget.com/2017/01/05/amazons-alexa-now-lives-inside-a-dancing-robot/) were on show, indicating that many manufactures were keen to harness the power, and the UX, that Alexa made possible.
Google have also entered the fray with Google Home. Boasting compatibility with Chromecast and Nest devices along with many others, Google Home seems to have been marketed more as an introduction to an Internet of Things (IoT) ecosystem than as a smart speaker.
With these new ecosystems come great opportunities for other companies to make an impact. Apple’s App Store has over 2.2m apps, Google Play over 2.8m. Alexa has only 10,000 skills (although growing incredibly quickly). Less competition means more chance of your app being downloaded.
So how do all of these advances in voice technology affect UX? How can app (or “skills”) designers take advantage of these new ecosystems? What are the barriers to market?
UX/UI designers are used to working with text as an input method. Voice changes everything.
Written commands or searches tend to be short and concise, e.g. “Weather forecast Nashville”. Voice commands are conversational: “Alexa, what is the weather forecast today in Nashville?” Gone are complex navigation systems, drop downs and pop ups, replaced with simple call and response setups to give you instant gratification.
As with pretty much everything, technology-related, clean metadata is critical. Underpinning all of these wonderful voice searches are tons and tons of metadata. Without clean metadata, the call and response system throws up all manner of false positives that at best irk the consumer and at worst transform the device into an expensive paperweight.
One issue global companies face is that Alexa currently only supports English and German; Google Home is English only. Inevitably, additional language support will be rolled out, but programming the nuances for multiple languages is incredibly tricky, so don’t expect it to happen overnight. Multiple language support also complicates the call and response-led system, so in the future it’s worth considering if your company really needs that support for Esperanto.
Thirty years ago, Star Trek made us dream of elevator doors that opened to voice commands, and communication devices that, once tapped, could connect you to anyone you asked for. Red Dwarf brought a voice-activated toaster to life, obsessed with making bread related products regardless of what was asked of it (waffle, anyone?).
The fact is that what seemed destined to remain science fiction has taken the consumer somewhat by surprise; I don’t think anyone really expected that the quality of service that both Siri and Alexa deliver would ever actually become a reality after so many false starts.
To conclude, if you are thinking about creating a skill, or a voice-controlled product, it is no longer something to be terrified of developing.
In the immortal words of Jean Luc Piccard – Make it so.
Rabbit and Hare are experts in UX and UI, both for visual interfaces and voice controlled systems. If you are thinking about designing a skill for Alexa or another platform call us today to discuss your project and how we can help.