Voice interface is still overpromising and underdelivering

Cortana speakerAround 15 years ago Flash technology was in the ascendancy. One of the odd conventions to emerge at the time was the ‘Flash intro’. Very often, to build your anticipation for the website awaiting you, you would be entertained with what was essentially an opening title sequence. And if you were really unlucky, on the other side of it was a website fully rendered in Flash.

What you wanted was content; what you got was an extended journey through a designer’s ego trip (and yes I should know, I was one of those designers). The basic premise of a Flash-built website was that tricking out the interface would make for a better user experience. That assumption turned out to be wrong.

With Siri, Alexa et al entering our lives, our interfaces now have personalities. If a digital misunderstands our requests we are likely to learn about it through a witty quip. TV ads featuring virtual assistants often make a particular show of droll one-liners emanating from the device.

But as Neilsen Norman Group research shows, voice interfaces are falling far short of user expectations. It seems that priorities need to be reassessed.

A little less conversation

As part of a project last year I began designing for command line interface. With no previous experience, a terminal window or console can be a daunting place. Initially I was puzzled why user prompts and feedback in this world were so clinical and abrupt. Why would command line users not want to be addressed in a more human fashion? The answer lies in task efficiency.

Command line interface evolved from single-line dialogue between two human teleprinter operators. Over time, one end of the human-human dialogue became a computer, and the conventions remained. These interfaces provide users a more efficient method of performing tasks. In short, command line users are just like the rest of us: that is, trying to perform a lot of tasks in as short a time as possible, without surplus dialogue or clutter getting in the way.

This method of working is totally in keeping with our tendency towards ever more concise communication. Email is on the wane due to the long, unwieldy threads it encourages. The rise of chat apps such as Slack is due in large part to the tendency towards more concise messages. We’re making less mobile calls, opting instead for text messages using abbreviations, acronyms and emojis.

Many rivers to cross

As designers we are not always trying to mimic a conversation. We are creating an exchange which delivers for the user as efficiently as possible. To re-cast all human-computer interactions as conversations is to misunderstand our relationship with machines and devices.

The obstacles to success with voice UI are many. Users need to think more than once about the commands they give. They are required to speak in a manner that often isn’t natural for them. Even relatively simple queries may need to be broken down into smaller questions before reaching anything like the right answers.

When barriers are placed between a user and the outcomes they want the end result is predictable: they will simply opt out. A report from The Information suggests that only 2% of Alexa speakers have been used to make a purchase from Amazon in 2018. Additionally, 90% of the people who try to make a purchase through Alexa don’t try again.

We are still some distance away from the dream that voice UI promised. Perhaps this is voice’s Flash period, where the user needs to work hard to access the content they want. And I’m willing to bet that most frustrated users would be willing to trade every ounce of their virtual assistant’s sassy responses for just a little more efficiency.

The fact is that voice UI is still pretty hard work, no matter how hard Siri or Alexa try to entertain us.

One size fits none

Who doesn’t love easy answers? Don’t we all feel good whenever the way forward comes quickly to hand?

With less effort, pain and outlay involved, the lure of easy answers is such that we tend to place undue emphasis on initial conclusions. This cognitive heuristic is called ‘anchoring’ or focalism, where we rely too heavily on the first piece of data we’re exposed to. Where the information is in line with an existing belief or preconception, confirmation bias simply compounds the issue.

Psychology aside, often the easy answers we seek are expediently classified as ‘best practice’. Indeed, when we ask “what is best practice?”, it can often be a thinly-veiled substitute for “what’s the easiest answer?”. The same question can similarly mask sentiments such as “if it’s good enough for company X, it’s good enough for us.”

Ten or twelve years ago, whenever a tricky decision point was reached on a website project, it wasn’t uncommon to hear someone chip in with: “Well, what do Microsoft do?”. The inference was that a) Microsoft had all the right answers (it didn’t) and b) that the context was the same (it almost certainly wasn’t).

‘Best practice’ can be an excuse for all manner of evils. Less than two years ago, the use of infinite scrolling – where a webpage would continuously load new content every time you reached the bottom of the page – was running rampant. It was a trend, and one that became de rigueur for any new website that wanted to appear ‘with it’. No doubt somewhere at some point it was referred to as best practice. But like any trend it is increasingly being left behind as we discover lots of sound reasons why it is bad practice. In the case of infinite scrolling, it meant that the website footer – use of which is definitely best practice – was forever just out of reach of the user.

No sooner has one best practice established itself than extrinsic factors change, and something else (albeit not necessarily better) has come along to take its place. What we call ‘best practice’ evolves in parallel with technology, user behaviour and established norms across industries. It never follows trends blindly. Sometimes trends are completely at harmony with firmly established principles. Sometimes they are not. And so it is with best practice.

There’s an old meme in the design industry that goes something like this: Good. Quick. Cheap. Pick any two. Meaning: good, quick design won’t be cheap, and so on. To paraphrase this for user research and customer insights: “Easy. Quick. Right. Pick any two.”

Whenever best practice is referenced as rationale for decision–making, the immediate follow-up should always be: “Okay. But is it best practice for our customers?”. Context is everything. Easy answers might be cheap, but they won’t necessarily be right.

Learning from others through benchmarking, both within and outside your sector, is an important element in establishing a direction of travel. However research and validation with customers are the true path to success.

Don’t let a quick and easy path to answers under the guise of ‘best practice’ either stifle your organisation’s chance to innovate and excel, or inspire outcomes which simply mirror your organisation’s in-house desires. The path to the dark side this is.