Brew Coffee

*Note this project was designed during the Voice User Interface Design course I took through CareerFoundry.

Overview

In 2019 I took the Voice User Interface Design course through CareerFoundry and created Brew Coffee. The result is a hypothetical coffee company's website and an Amazon Alexa skill.

Problem

Many people drink coffee but do not know how to brew their own. There are many resources to learn various brewing techniques but not a market leader in the VUI and multimodal format.

Goals

Design and develop a VUI and multimodal experience for users to educate and assist them in brewing different coffee types.

Role

VUI/UX Designer (Student)

Scope

Voice User Interface Design, User Experience Design, and User Interface Design

Timeframe

September 2019 - November 2019

WHY

Voice User Interfaces (VUI) and multimodal interactions are becoming more common in our daily lives. From checking the weather to making a reservation, voice assistants like Amazon Alexa are changing our interactions with technology. As users become more comfortable, companies are looking to further incorporate VUI. With any emerging technology, many products face challenges, from technical limitations to design challenges. As designers, we ensure these multimodal interaction experiences are intuitive, enjoyable, and safe for use. With this in mind and my personal goal of continued education, I decided to explore VUI design. A challenge I have encountered is exploring various coffee brewing techniques at home. This seemed like an excellent opportunity to implement VUI.

Considerations for Voice

VUI presents its own set of advantages and disadvantages and it is vital to consider the following:

Flattening navigation: menu options are present at the same level in VUI, allowing for simplified navigation and complex tasks such as setting a reminder. In graphical UI, a user would need to complete several steps before setting a reminder:

Open device
Navigate to calendar
Navigate to the appropriate day
Click on the day and time to create a reminder
Name reminder
Click "Create"

In contrast, setting a reminder using a Voice Assistant involves only one step:

Alexa, add a 2:30 p.m. reminder to my calendar tomorrow called "Call boss."

Voice allows us to skip hierarchy by asking for specifics (requesting a reminder to be set while defining date, time, and name). This simplifies navigation greatly compared to graphical UI.

Task Type: voice is useful when tasks aim to do a job instantly, like setting a timer, playing songs from a library, or operating a Smart Home/Internet of Things (IoT) device. Voice interactions become increasingly challenging when presenting listed or visually organized information. Lists and maps are more easily understood on a graphical interface. We know the strengths/weaknesses of Voice vs. Graphical and when the combination of both might be best.

Location: where the user will interact with the VUI must be considered early in design. Some users may be in crowded, loud, locations where hearing/speaking to a device may be challenging. Some users may not be able to use their hands or eyes. While driving a car, the user's attention is on driving. Sensitive financial or medical information may be involved, where speaking into a device is not appropriate. Placeonas emerge when considering locations and VUI may be used. Placeonas are various environments in which a user may be interacting with your service.

Cognitive load: the amount of mental power/resources needed to process information (includes visual and audio information). The flattening of UI can be a double-edged sword, as all menu options occur at the top level. Because VUI is based on audio information rather than visual, the number of options that can be presented to the user at one time is limited. As discussed in Voice User Interface Design, Michael Cohen et al., the ‘rule of thumb’ for visual interfaces is that people can only store seven items, in their short-term memories. For voice, however, Cohen cites research indicating that the number of options that can be reliably remembered is reduced to three. There are different opinions regarding the best number of options for a voice interface to give at once, but the point is not to overwhelm your users.

Device Type: today's devices range from phone, cars, and desktop computers, to VUI assistants (Alexa). Each device type presents a unique way of interacting with a product. Some provide a multimodal experience, while others are purely voice-driven.

Having considered the advantages/disadvantages of VUI, I began designing and developing an Amazon Alexa skill and website as discussed below.

How

I followed the User-Centered Design process to address the problem I wanted to solve using VUI. The process involves three phases, Discovery, Concepting, and Prototyping & User Testing. These phases can be further broken down into actionable exercises.

Identify Users
User interviews were conducted following best practices and resulted in the below persona. A slight difference from standard personas is to consider the user's typical speech style and the person the user would expect to interact with.

Alongside the user persona is a system persona or the identity of the Brew Coffee voice interface. The system persona is arguably the most important consideration during the design process. Brew Coffee leverages an existing text-to-speech (TTS) software, Amazon Alexa. It was essential to think of Brew Coffee as an employee represented company. Brew Coffee, the VUI, is a critical touchpoint and maybe the only touchpoint depending on which product the user uses. Humans often anthropomorphize objects, animals, and other things in our daily lives. This behavior is particularly healthy with voice. Extensive research into the interactions between humans and computers points to humans' tendency to treat computers as people. Humans can infer characteristics such as gender, age, and likability from voice alone. Gender and accents are essential to consider when designing the system persona.

Identify VUI features
With a clear understanding of Brew Coffee's target user, system persona, and the problem I wanted to address, I began defining user needs and user stories. During interviews, I developed the following requirements and user stories.

Requirements:

Provide recipes for hot coffee, cold coffee, iced coffee, and seasonal coffee.
Provide detailed descriptions of ingredients and equipments.
Provide easy to understand and follow instructions.
Provide an adequate number of coffee choices.
Provide global commands giving user flexibility throughout experience.

User Stories:

As a coffee lover, I want recipes I can make at home.
As someone new to coffee, I am not sure where to start. Accessing a list of various recipes would help me explore new kinds of coffee.
I’m always in a rush and want recipes that are fast and easy.
I’m new to coffee, the ability to repeat steps would be helpful.

Market Research
I conducted market research to understand what products currently exist. Many products, ranging from video tutorials, blogs, and a few VUI products exist. I conducted a competitive analysis of the top products across all categories aiming to learn what each did well and not so well. I identified key features to include in Brew Coffee and formed the product's Minimum Viable Product outline or MVP.

Wireframes and Prototypes
Wireframing and Prototyping Brew Coffee consisted of two products. The website's design and the VUI's design. The website design followed the standard design practice (if interested in learning more about my design process, my other portfolio pieces that cover this topic in more depth). The VUI's wireframing and prototyping consisted of user flows and voice scripts in excel. I iteratively designed both products by them testing and frequently addressing issues.

Screen Shot 2020-05-26 at 8.47.23 PM.png

Usability Testing for Voice Interactions: Early tests can be handled without programming. I met with potential users and ran through scripts to uncover pain points before investing time into code. Brew Coffee was added as an Amazon Alexa skill, and user testing was conducted using the skill.

Multimodal Interactions: I considered the different ways users may interact with the skill. For multimodal interactions, the combination of a graphical interface and a voice interface came into play. The graphical interfaces were designed based on the website branding. Multimodal interactions with Brew Coffee added another layer of interaction and functionality.

Accessibility: Refers to how a person who experiences disabilities can access or benefit from the design of products, services, or environments. Ensuring your design is accessible is critical. Speech design inherently helps for those with visual and mobility impairments. Voice systems as a whole are designed to work for the majority of users. This is accomplished by training the system on a large corpus of data that acts as a representative of the overall population. The main thing to consider when designing is that users should have a choice. Building the experience to work well with defaults and with the ability to customize further, making sure multimodal experiences use multimodality as an asset rather than as a requirement, and allowing a user to interact via the mode in which they feel comfortable.

Safety and Privacy
Every time a voice assistant arrives on a consumer device, news articles about security and privacy appear. With real examples (phone wiretapping) and fictional examples (dystopian world of Big Brother from George Orwell's 1984), journalists and consumers are concerned about who might be listening on the other side/what they're doing with that data. Some users object to being recorded or having audio recordings of their voices stored on the cloud. They're concerned that the devices are recording everything in the environment, rather than just specific requests made to the voice assistants, which would compromise user privacy. One of the first objections to in-home devices is the fact that they're always listening, which would, therefore, create breaches of both security and privacy. According to all major players, however, this is not true. Amazon describes it like this in their FAQ:

"Amazon Echo and Echo Dot use on-device keyword spotting to detect the wake word. When these devices detect the wake word, they stream audio to the cloud, including a fraction of a second of audio before the wake word."

To summarize, with in-home and hand-held devices, if you have the wake word turned on, they are always listening, but they're not streaming. Only once the wake word has been recognized do the systems send information to the cloud. Devices simply can't process streams of audio beyond their wake words and a few key phrases without the cloud connection. Therefore, devices can't actively eavesdrop on your conversation.

User Profiles are another layer of protection. Amazon's Echo devices and Google Home devices both allow multiple users to set up independent profiles. This means that numerous adults in the same house can create their accounts. The benefit is that personal information can be associated with the individual user. Individuals' data, such as digital calendars, map data, financial information, and personal data, are all secured by profiles. Most in-home devices support PINs, which give access to profiles and prevent unauthorized purchases.

Voice biometrics or voiceprints are another layer of safety VUI devices provide their users. Much like a fingerprint, voiceprints are different for each person. Voice prints allow the system to detect who is speaking and load the proper profile automatically.

Designers can protect users' privacy even without sophisticated biometrics or authentication PINs. Standard solutions in non-voice experiences also work well in voice experiences. The use of double-authentication, with user-provided information such as birthday, zip code, or account number, are solutions.

User privacy and data ownership relate to consumer protections such as HIPPA (Health Insurance Portability and Accountability Act of 1996) and COPPA (Children's Online Privacy Protection Act) in the U.S. Both are complex legislation aimed to protect consumers.

What was gained

This project resulted in a fully functioning Amazon Alexa skill by the name of Brew Coffee and a hypothetical company and website. I learned the history of VUI, and created voice user flows, scripts, personas, worked with AWS Lambda, conducted usability and accessibility testing, and built 3 Alexa skills from scratch. The skills acquired throughout the course directly apply to voice user interface design as a whole and are not specific to just Amazon Alexa. VUI is exciting and offers countless opportunities for companies and users. I am enthusiastic to continue exploring this field and following its steady integration into society.