Illustration by

All Talk and No Buttons: The Conversational UI

We’re witnessing an explosion of applications that no longer have a graphical user interface (GUI). They’ve actually been around for a while, but they’ve only recently started spreading into the mainstream. They are called bots, virtual assistants, invisible apps. They can run on Slack, WeChat, Facebook Messenger, plain SMS, or Amazon Echo. They can be entirely driven by artificial intelligence, or there can be a human behind the curtain.

Article Continues Below
Still from the movie WarGames
WarGames: David Lightman talking with Joshua.

My own first encounter with a conversational interface was back in 1983. I was just a kid, and I went with some friends to see WarGames. Young hacker David Lightman (played by Matthew Broderick) dials every phone number in Sunnyvale, California, until he accidentally bumps into a military supercomputer designed to simulate World War III.

We immediately realize that this computer is operating at a different level: it engages in conversation with Lightman, asks him how he feels, and offers to play some games. No specific commands to type—you just talk to this computer, and it gets you, and responds to you.

Fast-forward 30 years. My teammates and I at Meekan set out to build a new tool for scheduling meetings. We thought, “It’s 2014! Why aren’t calendars working for us?” We wanted simply to be able to tell our calendar, “I need to meet Jan for coffee sometime next week,” and let the calendar worry about finding and booking the best possible time and place.

First we sketched out a web page; then we built an Android app, then an iOS app, and finally an Outlook add-in. Each one was different from the next; each attacked the problem from a different angle. And, well, none of them was really very good.

Time-of-day options on our iOS App.

After building user interfaces for more than 15 years, for the first time I felt that the interface was seriously limiting what I was trying to do. Almost no one understood what we were attempting, and when they did, it seemed to be more difficult to do it our way than the old-school way. We could go on and crank out more and more versions, but it was time for a different approach. The range of possible actions, the innumerable ways users can describe what they need—it was just too big to depict with a set of buttons and controls. The interface was limiting us. We needed something with no interface. You could tell it about your meeting with Jan, and it would make it happen.

And then it dawned on us: we’re going to build a robot!

I’m going to tell you all about it, but before I do, know this. If you’re a designer or developer, you’ll need to adjust your thinking a bit. Some of the most common GUI patterns and flows will not work anymore; others will appear slightly different. According to Oxford University, robots will replace almost half of the jobs in the US over the next 20 years, so someone is going to have to build these machines (I’m looking at you) and make sure we can communicate properly with them. I hope that sharing some of the hurdles we already jumped over will help create a smoother transition for other designers. After all, a lot about design is telling a good story, and building a robot is an even purer version of that.

Photoshop? Where we’re going, we don’t need Photoshop#section1

Think about it. You now have almost no control over the appearance of your application. You can’t pick a layout or style, can’t change the typography. You’re usually hitching a ride on someone else’s platform, so you have to respect their rules.

Screenshot showing how the same message appears across Slack, HipChat, and WhatsApp
The same message in Slack, HipChat, and WhatsApp.

And it gets worse! What if your platform is voice-controlled? It doesn’t even have a visual side; your entire interface has to be perceived with the ears, not the eyes. On top of that, you could be competing for the same space with other conversations happening around you on the same channel.

It’s not an easy situation, and you’re going to have to talk your way out of it: all of your features need to be reachable solely through words—so picking the right thing to say, and the tone of your dialogue with the user, is crucial. It’s now your only way to convey what your application does, and how it does it. Web standards mandate a separation of content and style. But here, the whole style side gets thrown out the window. Your content is your style now. Stripped of your Photoshop skills, you’ll need to reach down to the essence of the story you’re telling.

And developers? Rejoice! Your work is going to be pure logic. If you’re the type of developer who hates fiddling with CSS, this might be the happiest day of your life.

The first tool in your new toolbox is a text editor for writing the robot’s script and behavior. When things get more complicated, you can use tools like Twine to figure out the twists and turns. Tools and libraries for coding and scaling bots are cropping up by the dozens as we speak—things like Wit.ai for handling language understanding, Beep Boop for hosting, and Botkit for integrating with the popular Slack platform. (As I write this, there is still no all-encompassing tool to handle the entire process from beginning to end. Sounds like the voice of opportunity to me.)

But, let me say it again. The entire field of visual interface design—everything we know about placing controls, handling mouse and touch interaction, even picking colors—will be affected by the switch to conversational form, or will go away altogether. Store that in your brain’s temp folder for a little while, then take a deep breath. Let’s move on.

First impression: introduce yourself, and suggest a next step#section2

Imagine a new user just installed your iOS app and has launched it for the first time. The home screen appears. It’s probably rather empty, but it already has some familiar controls on it: an options menu, a settings button, a big button for starting something new. It’s like a fruit stand. Everything is laid out in front of you: we got melons, we got some nice apples, take your pick.

Compared to that, your first encounter with a robot is more like a confession booth. You depend on the voice from the other side of the door to confirm that you’re not alone, and guide you toward what to do next.

Your first contact with the user should be to introduce yourself. Remember, you’re in a chat. You only get one or two lines, so keep it short and to the point. We’ll talk more about this in a second, but remember that having no visible interface means one of two things to users:

  • This thing can do whatever I ask him, so I’m going to ask him to make me a sandwich.
  • I have no idea what I’m supposed to do now, so I’m just going to freeze and stare at the screen.

When we did our first tests, our users did just that. They would either just stare, or type something like “Take me to the moon, Meekan.”

We were upset. “Why aren’t you asking him to schedule stuff for you, user?”

“Really? He can do that?”

It’s not obvious. So use introductions to define some expectations about the new robot’s role on the team. Don’t be afraid to glorify his mission, either. This robot handles your calendar! That way, users will be less disappointed when they find out he doesn’t make sandwiches.

Immediately follow this intro with a call to action. Avoid the deer-in-headlights part by suggesting something the user can try right now.

Hi Matty! I’m Meekan, your team’s new scheduling assistant. I can schedule meetings in seconds, check your schedule, and even find flights!

Try it now, say: Meekan, we want to meet for lunch next week.

Try to find something with a short path to victory. Your users just type this one thing, and they immediately get a magical treasure in return. After this, they will never want to return to their old life, where they had to do things without a robot, and they’ll surely want to use the robot again and again! And tell all their friends about it! (And…there you go, you just covered retention and virality in one go. It’s probably not going to be that easy, but I hope you get my point about first impressions.)

Revealing more features#section3

When designing GUIs, we often talk about discoverability. If you want the user to know your app is capable of doing something, you just slap it on the screen somewhere. So if I’m new to Twitter, and I see a tweet for the first time, my options are set in front of me like so:


Easy. I’ll just hover my mouse over these little icons. Some of them (like stars or hearts) are pretty obvious, others might require some more investigation, but I know they’re there. I look around the screen, I see my Notifications link, and it has a little red number there. I guess I received some notifications while I was away!

Screenshot showing Twitter UI elements: the Home, Notifications, and Messages icons

But when talking to a robot, you’re just staring into a void. It’s the robot’s job to seize every opportunity to suggest the next step and highlight less-familiar features.

  • Upon introduction: as we mentioned earlier, use your first contact with users to suggest a task they could ask the robot to perform.
  • Upon receiving your first command: start with a verbose description of what’s happening and what the robot is doing to accomplish his mission. Suggest the next possible steps and/or explain how to get help (e.g., link to a FAQ page or a whole manual).
  • Now gradually remove the training wheels. Once the first interactions are successful, the robot can be less verbose and more efficient.
  • Unlock more achievements: as the relationship progresses, keep revealing more options and advanced tips. Try to base them on the user’s action history. There’s no point explaining something they just did a few moments ago.

Meeting synced! Did you know I can also find and book a conference room?

  • Proactively suggest things to do. For example, users know the robot reminds them about meetings, but don’t know the robot can also order food:

Ping! There is a meeting coming up in one hour. Would you like me to order lunch for 3 people?

If the robot is initiating conversation, make sure he gives relevant, useful suggestions. Otherwise, you’re just spamming. And of course, always make it easy for users to opt out.

Cheat whenever you can#section4

It’s easy to assume our robot is operating inside a pure messaging or voice platform, but increasingly this is not the case: Amazon Echo is controlled by voice, but has a companion app. WeChat and Kik have built-in browsers. HipChat allows custom cards and a sidebar iframe. Facebook and Telegram have selection menus. Slackbot inserts deep links into messages (and I suspect this technology will soon be more widely available).

Screenshot showing how Slack uses deep links
Slackbot uses deep links to facilitate actions.

With all the advantages of a conversational interface, some tasks (like multiple selections, document browsing, and map search) are better performed with a pointing device and buttons to click. There’s no need to insist on a purely conversational interface if your platform gives you a more diverse toolbox. When the flow you present to your user gets narrowed down to a specific action, a simple button can work better than typing a whole line of text.

Screenshot showing Telegram’s interface, which uses pop-up buttons
Telegram uses pop-up buttons for discovery and for shortcuts.

These capabilities are changing rapidly, so be prepared to adapt quickly.

And now, we ride#section5

As users become more familiar with chat robots, they will form expectations about how these things should work and behave. (By the way, you may have noticed that I’m referring to my robot as a “he”. We deliberately assigned a gender to our robot to make it seem more human, easier to relate to. But making our assistant robot male also allowed our team to subvert the common stereotype of giving female names to robots in support roles.)

The definitive book about conversational design has yet to be written. We’ll see best practices for designing conversations form and break and form again. This is our chance as designers to influence what our relationship to these machines will look like. We shape our tools and thereafter they shape us.

In the next part of this article, we’ll dive deeper into basic GUI patterns and discuss the best way to replicate them in conversational form.

About the Author

Matty Mariansky

Matty Mariansky cofounded and is the product designer at Meekan in Tel Aviv. He has designed everything from slot machines to financial news sites. Matty invented the “traffic light” system for smart, effortless casual wear. He, for one, welcomes our robot overlords.

21 Reader Comments

  1. Thanks!
    We mostly use our own in-house custom Python code, but we started very early. These days you can whip up a quick MVP with a lot of off-the-shelf and open source stuff.

  2. Hi Matty!

    I would like to ask you some questions.
    We now developing our own educational web platform (or it can call app, messenger, social network). It’s now on a stage of developing so I can’t tell you what exactly it is. But the first qwestion is – How can we integrate your bot to our product? Is it possible at all? Second one is – Can you provide any details how does your snippets or open graph’s tags work. How do you get the information from shared link?

    I, actually, have a lot of other questions, but here is not the best place to ask them. Can you provide any email or smth else to contact you? Of course, if you’re available to talk.

    Thank you in advance. Really like your article.

  3. I guess that Google, Apple and Microsoft spotted that this is going to represent a big part of our future, hence Google Now, Siri and Cortana.

    The biggest challenge to this is still the voice recognition, especially when multiple apps are “listening”, as you have stated in the article.

    I see this issue being overcame little by little, as the technology advances and as more data is gathered regarding this.

    For example, a shopping app won’t react when you are talking about scheduling a meeting, or booking a flight.

  4. @William (SBP) I have to say I actually much prefer text messaging over voice operation – talking takes almost 100% of your attention both ways (when talking and when listening), while messaging can be handled in fragments and with less cognitive load.

  5. Nicely put together Matty! At Wit.ai, we couldn’t agree more with you. I would just add that sometimes and depending on your app or device, the best conversational interface is no interface… This is especially true when your app is self-contained and self-explanatory. In front of a smart thermostat, one will know what commands/questions to ask.

  6. Coincidentally, a video game I played very recently left me fascinated with the idea of conversations and dialog trees so I’ve been thinking about how to incorporate a similar conversation structure within a non-game UI. I’m clueless about the technologies behind these bots, though. I’m looking forward to learning more about the conversational UI.

  7. Interesting and thought-provoking article. “What if your platform is voice-controlled? It doesn’t even have a visual side; your entire interface has to be perceived with the ears, not the eyes.”

    A platform without a visual side is a phone. =)

    PS Tell us more about your casual wear invention.

  8. @Beth Edwards – well, Amazon Echo doesn’t have a visual side (unless you count the light ring on top), and it can’t make phone calls.
    [About the casual wear – you’ll have to sign an NDA]

  9. Many of the recommendations are also good practices for visual interfaces with a rich set of functions: on-boarding (first task), proactive suggestions for next steps, and moving the user from training wheels to deeper engagement. Just because the options *can* be visually discovered doesn’t mean they *will* be discovered and used. Where visual interfaces risk causing analysis paralysis, conversational ones risk the opposite (nothing to analyze/blank slate).

  10. Regarding “The entire field of visual interface design—everything we know about placing controls, handling mouse and touch interaction, even picking colors—will be affected by the switch to conversational form, or will go away altogether.”

    What literature, do you think, can be useful for getting smarter about conversational interfaces? Broadly speaking, building erudition, not how-to guides.

  11. Hi Matty!
    I absolutely love the logic behind every little decision that made more conversational and useful
    Thanks

  12. I’ve been working for the last couple months on a project which used to have a conversational UI. I’m exploring the potential of gesture-control of everyday objects and I used to have an assistant for quite a while in my project. Eventually I killed it, because I found it too difficult to find the right tone. But still, it is a very interesting topic.
    If anybody is interested in checking out my project: here you go:
    http://j.mp/ba-gestio

  13. “It doesn’t even have a visual side; your entire interface has to be perceived with the ears, not the eyes.”

    How does that work for deaf people?

  14. Thank you for taking the time to share this article Matty.

    I hope visual interfaces don’t disappear entirely. When interacting with a great app or website, often, the experience is enhanced because of the integration of great visuals or animation. Removing this from the experience could result in less enjoyable experiences.

    I agree, conversational interfaces will be useful for many task-oriented applications; however, a visual interface will continue to be the best solution in many cases; some of which you touch on. For example, visual interfaces afford excellent scannability. When entering a search term users can quickly scan results on a visual interface. Scannability is enhanced by visual cues, larger font sizes, etc. I wonder how this would translate to voice. Maybe interfaced of the future will be a cohesive integration of both voice and visual.

    Usability issues may hold conversational interfaces back from becoming widely adopted by users. For example, interacting with Siri is still frustrating. Simply asking Siri to call someone often results in the wrong person being called, although this may be because of my Australian accent.

    The concept of “proactively suggest[ing] things to do” would work well with anticipatory design. As data is collected about user behavior, the system could anticipate and suggest what the user will what next.

    From you article, it appears conversational interfaces have their place; however, implementation will be dictated by user needs and usability. I hope conversational interfaces evolve following human-centered design principles.

  15. Hi Matty,

    Thanks for the great article. I’m looking forward to the next in your series.

    I work at a company called PullString as a Creative Director and Writer, where for several years, we’ve been developing our conversational authoring platform. (We’ve also released a number of chatbots and conversational experiences across various platforms like Skype, Facebook Messenger, etc.)

    I spend a lot of time thinking about developing the relationship between the bot character and the user, and how to make that relationship feel as “human” as possible. Fortunately, the platform we’ve developed has been uniquely designed by both engineers and creatives together, so we have a lot of capabilities and features that have contributed to our creating some really successful bots.

    The PullString Authoring platform is available for free download. I’d love to know what you think of it. https://www.pullstring.com/

    Sarah Wulfeck

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA