Voice UI Best Practices: Learning from Phone Systems
The IVRs we deal with regularly can teach us a great deal about how to approach voice user interfaces.
Think about the last time you picked up the phone to contact a company you dealt with. Why did you call? How did you get the number? What happened when you called? Even if you’ve never built a VUI or purchased an Echo, you’ve been building empathy with your future voice customers, one phone call at a time.

Most major companies today use some form of IVR for their phone systems: short for interactive voice response system. The designers who worked on these IVRs played a key role in the early wave of today’s voice-activated assistants. But as a customer who probably uses these systems in times of duress, you can build your voice design skills by turning those experiences with IVRs into lessons learned.
User goals first, discovery second
Any obstacle you place between your customer and the goal they want to complete will rapidly frustrate them.
A common pattern in the world of IVRs is an advertisement for the company’s website. Sometimes, these advertisements occur on the initial menu, before the customer even hears the menu options.
Remember that last phone call to a company? You probably got the phone number from the company’s website. Worse yet, you might have spent non-trivial time hunting for that number in the first place (yes, I’m looking at you, Amazon.) How do you feel, then, when your progress in the call is slowed by an advertisement for the site you just came from?
Try reframing this exchange. Instead of calling your bank, you’ve gone in person. But once you get to a teller, the first thing the teller says to you (even before asking why you’re there!) is “Did you know you can do this on our website? Go to www.ourbankname.com and you can check balances, reorder checks, and more!”
If a real person did this to you, how would you feel? Annoyed? Insulted? Of course you know how to find the company’s website — you probably used it to get the bank’s address. Why is the teller prolonging your interaction instead of just helping you?
(Aside: as much as I love 90’s nostalgia, PLEASE drop the WWW from spoken URLs. It’s astonishing how grating three letters can be.)
When designing a voice user interface, keep in mind that UX in spoken form triggers different psychological reactions in your customers when compared to a traditional graphical interface. Our brains REALLY want to interpret spoken information as coming from a person. And a person who doesn’t acknowledge your customer’s needs first is going to elicit an emotional response.
When a customer engages with your voice app or assistant, assume that they have an explicit goal in mind unless you’re told otherwise — and don’t get in their way.
Save your advertisements, personality, and news for AFTER your customer’s task is completed — or at the very least, save them until you understand what the customer wants.
Once you’ve helped a customer complete a goal, they MAY be more receptive to information about online features or other help. But let them complete their task FIRST.
Streamline repeated prompts
The ultimate in unintelligent design is the classic “Please listen carefully to the options as the menus have changed.” When did they change? Why do you tell us this every time we call? Why would we choose to call and NOT listen?
Don’t lead with a ton of information. Instead, think of your prompts as a pyramid. As frequency increases, so too should brevity.
The pyramid principle is particularly true for hierarchical systems like IVRs, where the first prompt is often a blocker to getting information required to advance. If customers tend to use your system frequently, a long introduction may grow significantly more frustrating with each use.
Even if your system uses spoken input (as opposed to the common touchtone input for IVRs), streamlined initial prompts are key. Customers are unlikely to barge in on your assistant if it’s droning on — which will prolong EVERY interaction with the system. This ties back to our point earlier: if customers subconsciously treat spoken UX more like a person, they’re unlikely to interrupt unless they make a habit of interrupting others.
Don’t ask for what you can’t use
You undermine your system’s credibility when you ask for the same information multiple times, or when you discard information.
I’ve spent a great deal of time working through phone trees while seeking treatment for medical issues, and a really frustrating issue is the use of the social security number as ID verification. This is common for utilities, medical billing and sometimes insurance.
In cases where an interactive voice system is present, I generally prefer keying in that number for privacy. Yet what invariably ends up happening is this:
- System asks for SSN
- System verifies identity (presumably) and puts me on hold
- Operator picks up and immediately asks me to verify my SSN again, only spoken this time.
Why did the system ask for this sensitive information and then fail to pass it on to the operator? If the SSN was passed on and they still have to verify my identity later, could the operator at least ask for a less sensitive piece of information to be spoken aloud? At the very least, simply acknowledging that the operator received that information prevents me from jumping to the conclusion that your systems are malfunctioning.
Any time you seek information from a customer and appear not to use it, trust erodes. Be mindful of this particularly when dealing with identity and security.
If customers don’t trust you to competently manage their information, they are likely less inclined to give your business more money.
Case Study: Comcast
Over the years, Comcast’s IVR system has become remarkably streamlined. Upon connecting (assuming your phone number is in their system), the IVR delivers the following prompt:
“Welcome to Comcast. Home of Xfinity. I have the first few numbers of your street address as X, apartment Y. Is this the unit you’re calling about?”
This is a pretty great start. We don’t need the Xfinity part, but I understand why it’s there, and at least they didn’t force us to sit through a slogan. And by ending the prompt with a question, it’s very clear the system is waiting for a Yes/No answer.
Once the customer’s identity is confirmed, a line is crossed: the system plays the sound of keystrokes while loading. This is clearly an attempt to make customers believe the system is a human. This gets into a trust debate bigger than this post, but I believe transparency is in a customer’s best interest. If it’s a synthetic agent, don’t trick me into believing otherwise.
Tricks aside, the next step is a very crisp attempt to start a conversation:
“In a few words, tell me how I can help you today.”
If that prompt is repeated twice with no answer, the caller is redirected to a more formal main menu with a list of verbal commands. A graceful way of providing more context. Well done.
However, this same fallback menu fails to consider why customers would need such a menu. Since failed utterances may be due to speech impediments, accents, or environmental interference, it seems like physical input as an option (i.e., “For billing, press 1”) would be important at this stage.
Speech is a fantastic input medium, but it’s not a universal solution. Don’t forget to design for failure cases.
Keep an Ear Out
The next time you have to contact a company by phone, pay close attention to the choices they’ve made. How long does it take you to complete your task? Did the interface get in your way? How did you feel after the experience?
If you have particularly good or bad examples, please share them in the comments. I know there is an entire cottage industry of designers working hard on these screenless interfaces, and I also know plenty of companies can’t afford their insight just yet. Either way, we can learn from these experiences as a close cousin of the current wave of voice-enabled assistants and products.
Cheryl Platz has worked on a variety of voice user interfaces including the Echo Look and Echo Show, Amazon’s Alexa platform, Windows Automotive, and Cortana. She is currently Design Lead for the C+E Admin Experience team at Microsoft. As founder of design education company Ideaplatz, Cheryl is also touring worldwide with her acclaimed natural user interface talks and workshops. Join her at Interaction ’18 in Lyon this February.