the future of voice assistants

my experiences and thoughts about Voice User Interfaces

Published in

Prototypr

7 min readOct 10, 2017

Voice is the most simple and powerful medium. Everyone has it and it is the most personal way to convey our thoughts, messages, ideas and questions. Voice assistants have the potential to use our voice to augment our lives and make mundane tasks seamless. Assistants should be an asset not a burden. The advertisement with the Rock using Siri portrays an idealistic relationship between a human and voice assistant. However in reality, most voice assistants are considered to be gimmicks that can answer simple questions relating to the weather or traffic.

This summer I explored the world of Voice User Interfaces while building my first Amazon Alexa Skill (basically an app). I built a train schedule skill that allows San Francisco/Bay Area users to ask questions about the local train service. This simple use case allowed me to delve into the various facets of VUI development and user experience. In this article I explore the future of voice assistants and my experience in creating an Alexa Skill.

level playing field

There is a learning curve associated with all products we currently use. From our phones, apps, computers and even fridges, we have to learn how to use them. While this is not an issue for those in the tech sphere, there is a large segment of the population either missing out or not fully capitalizing on these innovations. This is a two sided issue in the sense that many users are not getting access to technology and tech companies are not getting full access to these users. Voice assistants have the power to connect with this unheard population. There should be no learning curve for voice and we need to create tools to empower users by just using their voice.

design and development

The crux of a voice assistant program boils down to two components, intents and slots. An intent categorizes a request based on the type of question being asked. This classification in a broad sense determines the type of answer the user will receive. A slot, is certain information the system must receive from a user to fulfill the parameters of the intent. Some slots are optional and others are required. If a slot is required to fulfill the parameters Alexa will prompt the user to fill the slot through a clarifying question. I have broken down some of the intents and slots I used in my train skill to help clarify this topic.

The favorite stations intent uses two preset stations that the user can set to help quickly locate when the next train, or a train at a certain time, from their frequently used station is leaving. If the user does not fill the time slot the system will find the next train based on the current time.

The specific stations intent allows users to find the next train between two specific locations. Because this method does not utilize any of the preset stations, users must fill the slot for the departing and arriving city. If these slots are not filled Alexa will ask the user “Which station do you want to depart from?” or “Which station would you like to arrive at?”. Again just like the Favorite Stations Intent if the user does not fill the time slot the system will find the next train based on the current time.

Ideally, slots should be so intuitive to fill that users answer them while asking the question.

core integration

One of my biggest qualms with Alexa is the burden placed on users to remember certain phrases to invoke commands. Here is a comparison between an ideal interaction and one with Alexa.

What should be a two step process is in reality a four step process

The Amazon Alexa skills store recently reached over 10,000 skills and is growing quickly. However, only 3% of users will come back to a skill 1 week after installing it. This poor retention rate limits the potential of Alexa for both developers and users. This all boils down to a core issue — Amazon is trying to teach people what to say.

Do not try to teach people what to say

To truly embrace the power of voice Amazon must work on integrating apps within the Alexa system. Unlike a phone where users can interact with apps visually, nothing reminds an Alexa user which apps they have installed. Instead of being able to tap on an app to open it, users must remember a certain phrase to activate the voice application. In the above example, users must memorize the skills name, “Bay Area Transit, to activate it. Even minute differences in intended phrases can cause errors and unwanted messages.

Skills should be activated through context/key words instead of exact phrases. For example, if a user asks a question about a train, Amazon should query and activate the associated skill on the user’s device.

A voice experience with Core Integration

By integrating skills within the core Alexa experience, Amazon will truly harness the power of developers. In turn, users will not have to memorize certain phrases and will just have to ask a question.

It should be as simple as just asking a question.

voice based alexa skills store

The Alexa skills store has a big identity issue. It is a place to find apps that disconnect users from their mobile devices — however it can only be accessed through a mobile device. To truly be a voice assistant, Alexa should suggest and download skills directly through user commands.

Skill Suggestions/Recommendations

For questions Alexa cannot answer, Amazon should suggest multiple skills based on reviews and location. These suggestions could be based on skills which fulfill the intent and slots Alexa believes the user is trying to fill. This would raise awareness to developers applications and theoretically allow more questions to be answered. Core Integration is very important because as users download more skills it would be unreasonable to expect them to remember names for each skill.

As much as Amazon is a site to buy products it is also the to go place to check for reviews. Even when I buy a product at a local store I often check the Amazon reviews first. I truly believe a platform that gives users the best application recommendations will differentiate themselves from the competition.

If Amazon can extend this reviewing culture to skills, they will build an skills-store which empowers the best skills and all users.

skeumorphic design

Mobile devices and laptops were originally built with skeumorphic interfaces so that users could easily use them. However, developers soon moved to flat interfaces to accomodate far superior and easier user experiences. I believe that voice assistants are following a similar trajectory.

We are limiting Alexa, Siri and Google Home by labeling them as assistants

By labeling these devices as assistants we are setting a low-bar for these devices and are essentially dumbing them down. We need to look for the long term achievements not short term fixes. Unlike an assistant, these devices should be predictive and help us even before we ask a question. In fact we should be speaking to our devices less and less often. Based on each of our family members schedules these devices should help guide us throughout our day without much user input.

This change will not and should not happen immediately. The skeuomorphic nature of these tools as voice assistants is helpful because it helps users feel comfortable and secure asking questions. However, the eventual move from the skeumorphic design of voice experiences as voice assistants `will allow for creators to buid vastly more powerful and helpful tools.

wrap-up

Making my first Alexa skill this summer was a blast. As an entry-level developer, the Amazon developer platform was helpful and intuitive to use. A big thanks to @PaulCutsinger for creating all the tutorial videos and answering my questions on twitter. I truly believe that voice assistants are on the path to connecting everyone with technology. In addition, even though I reference Amazon Alexa throughout this article many of these comments apply to all voice assistant platforms.

The best products simply allow people to do what they have always done just making it better and easier. I believe that our voices are the future.

___________

👋🏻 Hi there, I am Rajiv Sancheti. Thank you so much for reading my article! Currently I am studying Human Computer Interaction and Computer Science at the University of California, San Diego. I love to talk and get suggestions. You can find me on www.rajivsancheti.com, www.linkedin.com/in/rajivsancheti/ and https://twitter.com/sanchetio