What Is User Acceptance Testing? A Guide to UAT
Before footwear company Hilos launched its 3D-printed shoe line and released it into the wild, its top brass wanted to answer a fundamental question: Would users accept it?
This is the sort of question user acceptance testing (UAT) seeks to answer.
What Is UAT?
User acceptance testing is the final phase of the software development cycle and involves having target audiences test a product or feature in real-world scenarios before it is officially released. Companies often gather test subjects through volunteers, paid users or by releasing a free trial version of a product.
UAT is a way to determine whether people will use a feature the way the product team intended.
What Is User Acceptance Testing?
User acceptance testing (UAT) is when subjects selected from a target audience interact with a product or feature under real-world circumstances. This stage is often last in the software development process and reveals how users are likely to engage with a product or feature. Teams then make final adjustments before officially releasing a new or updated product to the public.
“If you’re drilling down on something that is novel and innovative, it’s a cut-and-dry way to determine if this new or novel feature will be accepted by the majority of clients. If it is, how? If not, who is not accepting it and why?” Elias Stahl, Hilos’ CEO, said.
“In the end, UAT is the process of finding out if a user actually wants or needs a feature.”
While it is often conflated with other forms of user testing, most experts agree that user acceptance testing is a desirability check on a narrowly defined feature or piece of functionality.
“UAT is the process of finding out if a user actually wants or needs a feature,” explained Andrew Wachholz, a user experience design consultant at Designing4UX. “So, while usability tests may go well (people can use [the product] according to how we designed it) and functional tests may go well (we tried to break it, and it didn’t break), if the user rejects the feature when it is available to them, UAT has failed.”
Besides evaluating specific features, UAT testing also serves as a supplement to quality assurance and other previous stages of testing. As a result, this phase is a final chance for teams to catch bugs, incorporate user feedback, tweak inconvenient features and ensure the best version of a product is ready upon its official release to audiences.
Devising an effective approach to user acceptance testing depends on the maturity and resources of your company, the scope and type of release, your intended audience and your risk tolerance.
User Acceptance Testing Steps
How to Conduct User Acceptance Testing
- Determine whether UAT is necessary.
- Create a scope and plan for user acceptance testing.
- Inform user acceptance testing with clear acceptance criteria.
- Identify test methods and use cases during UAT testing.
- Select the right target audiences for user acceptance testing.
- Design open-ended UAT that focuses on product goals.
- Evaluate user acceptance testing results and develop next steps.
1. Determine Whether UAT Is Necessary
UAT can be a useful way to gauge user affinity for a feature, but it is not for everyone, said Drew Falkman, director of product strategy at digital transformation consultancy Modus Create. One drawback is that it can be time-consuming and costly to recruit sample users or marshall the resources to conduct testing internally.
Third-party testing platforms, such as Respondent or UserTesting, can run anywhere from $25 to $150 per participant for a session. And because user acceptance testing is often conducted by teams of five to 10 people, it yields qualitative results that rarely reach statistical significance.
As an alternative to user acceptance testing — or as an added validation measure — many larger firms are starting to use “feature flagging” to gauge user behavior. With analytics tools like Launchdarkly and Optimizely, a product owner can launch a feature to a small percentage of users and measure engagement.
“So, for example, if you’re Amazon, you can flip the switch on something, I believe, for two to three minutes,” Wachholz said. “And you will have anywhere between 10,000 and 15,000 individuals that have interacted with it. You’re going to reach statistical significance almost immediately.”
Ana Grouverman, a product lead at Spotify, said product teams occasionally conduct user acceptance testing as a preemptive measure to save face in the event of a slight dip in metrics after a release.
“That’s probably not a good use of resources, because change aversion is a real thing, and you can accept that, maybe, you’ll take a one-week blip,” she said.
For young companies such as Hilos, however, UAT testing can prove invaluable. The company offers shoppers an on-demand, customized experience much like that of visiting a tailor, where choices about material selection, styling and sizing are highly refined. UAT led to insights about choice optimization.
“If we hadn’t done this in a rigorous way, we would have been too much to too many people, and nothing to one.”
UAT also clarified Hilos’ true value to customers, which was not bespoke customization, but the comfort and style of their shoes.
“If we hadn’t done this in a rigorous way, we would have been too much to too many people, and nothing to one,” Stahl said.
2. Create a Scope and Plan for User Acceptance Testing
The scope of UAT testing should be defined by the user story and feature specifications of what you’ve built, Wachholz said. Say you’re building a feature for a mobile app to allow users to order pizza and have it delivered at a specified time. That’s the user story, and that’s the end result you’re testing against.
Developing meaningful acceptance criteria will help you create and test all aspects of the build contained within that story. Results of acceptance testing are often binary: A process either passed or failed.
Typically, a product manager or user experience designer will develop the testing plan, beginning with a set of criteria aligned with feature specifications laid out at the start of the development cycle. In the case of a timed mobile order for pizza, acceptance criteria might look like this:
- Can a user order on iOS? On Android?
- Can they order it for specified times?
- Can they order a small pizza?
- Can they order a pizza with a thin crust, thick crust or cheesy crust?
- Can they order it from 10 miles away?
These determinations are intended to flush out overt deficiencies or limitations in the design.
According to Wachholz, planning for user acceptance testing involves vetting users, setting up testing environments, defining if tests are moderated or unmoderated and defining how testers will record the results.
Many smaller companies that don’t have access to a robust set of users conduct UAT internally with their own staffs or teams, he explained. Participants are assigned a list of use case scenarios that, with minimal guidance, they can complete. To ensure the user interacts with the feature in question — and that the interaction generates usable data — some tasks will be more scripted than others.
Design consultancies also provide guidance and technical assistance for user acceptance testing, Falkman said. Modus Create recently put together a four-week validation plan for AARP in preparation for the launch of a new feature in the company’s Money Map app for financial planning and debt management; it was part of an ongoing user research agreement between the companies.
During the first week, Modus Create worked closely with AARP’s product team to plot different paths users might take on the app.
- How would someone who doesn’t have enough money to pay their debts experience the app?
- How about someone who has just enough money?
- How about someone who has plenty of money and just needs to decide which debt to pay down first?
That same week, they outlined recruitment strategies to solicit input from five to 10 users for each path. They also drafted test scripts to provide instructions for participants.
Falkman pointed out that UAT testing can take weeks to months, depending on the size of the project, but ultimately the scope of the user story is what drives decision-making.
“The bottom line is that everything starts at the mapping of user stories. Stories should be well-formed and not technical. Size matters,” he said.
3. Inform User Acceptance Testing With Clear Acceptance Criteria
Falkman explained acceptance criteria like this:
“Acceptance criteria should be born out of thinking through the product, and, if done well, the acceptance criteria should be the base for the QA team to conceive of and write tests,” he said. “We recommend having a ‘definition of ready’ worked out with the team so that it can specify what it needs in order to just grab a story and go. This also ensures that whoever is writing the user stories has time for acceptance testing.”
If the user story for a Money Map customer hinges on their ability to check whether they’ve paid a debt for the current month, acceptance criteria for UAT would ensure the functional requirements are met and assess basic optimization and design considerations. If these elements pass muster, the feature is ready for prime time. For example:
- Does the page output data from the current month?
- Does it report if users have or have not paid?
- Is there a way to go back?
- Does it only accept numerical entries?
- Is there a maximum numerical entry?
- Is the font and spacing consistent with the rest of the site?
Most importantly, the UAT testing acceptance criteria should reflect the user’s point of view.
“Presumably, there’s going to be QA testing as well. So, as a product owner, this is really to make sure that the [feature] works and it’s everything it needs to be, so that I can say, ‘Yes, ship it and move on to the next piece,’” Falkman said.
4. Identify Test Methods and Use Cases During UAT Testing
Lauren Chan Lee, a product professional who has led teams at Care.com and StubHub, groups users into three main buckets: consumers, B2B clients and users internal to a company. Each requires a somewhat different approach.
When Care.com was developing a new user flow allowing internal operations teams to create care-center records, the user acceptance test was relatively straightforward: a day-long checklist Lee put together that a member of the operations team tested independently. Did the website update childcare, senior care and other service records as intended?
B2B cases are trickier. Clients can be very invested in feature changes and vocal in expressing their viewpoints. Care.com has a customer advisory council comprising key clients that Lee turns to for feedback on new releases and reports. She allocates a week ahead of a release for members of the council to conduct user acceptance testing.
“Some subset of [users] has to demonstrate a likelihood to use it in the way that you intended it to be used.”
For a large-scale consumer release, Lee might convene product, engineering, development and design teams for a “bug bash,” in which team members assigned to various features of a website overhaul — the buyer flow or seller flow, for instance — work through UAT testing together.
There is an important difference between testing a consumer product versus a piece of third-party software for a large organizational rollout, Grouverman said. With the latter UAT scenario, you often have a captive audience, so the bar for acceptance is lower. To introduce a human resources system to record employee vacation days, for example, you might roll it out to a portion of employees and ask them if they’re willing to use it and if they encountered any glaring problems.
In other words, users are likely to accept a less-than-perfect payroll feature because, at some level, the decision has already been made. However, for a consumer product like Spotify’s, users must have an affinity for the change and be willing to embrace it quickly.
5. Select the Right Target Audiences for User Acceptance Testing
The audience you select for user acceptance testing depends on what you’ve built and what you’re seeking to learn, according to Wachholz.
“If it’s a new minimum viable product feature, and you want to record engagement rates to indicate whether your team should continue to build the feature further, you might open testing to all users,” he said. “If it is an enhancement to an existing feature, it’s best to narrow your test subjects by analyzing usage and frequency. Your goal is to get the most qualified people’s eyes on the new enhancement.”
Ideally, you’re hoping to capture unbiased participants who accurately reflect the ages, demographics, use locations and behaviors of your target users — and who understand the software.
“For example, if it’s some sort of workflow that has to do with managing a bunch of expenses within QuickBooks, we would make sure we recruited people that knew QuickBooks so that we didn’t have to give them an overview on how the software works, because that would totally throw everything off,” Falkman said. “If there were certain steps involved to get the app to the state where they’re using it, we would make sure users understood that, and we would give them an opportunity to ask questions.”
If UAT is conducted in-house in a publicly accessible environment, the cost is fairly negligible — just the time of the team members involved. In this case, the UX team and product manager will distribute a script to internal volunteers, who go through a task list while being observed over video or in person. However, this approach is “the least desirable solution and can lead to biased results, as the team wants the build to succeed, not fail,” Wachholz added.
Alternatively, there are several third-party services that allow you to invite and schedule phone interviews, in-person interviews or online research sessions with paid participants for UAT testing. Typically, participants can be pre-vetted to ensure they resemble your target audience. However, there are risks to this approach as well, because paid participants can become vested in giving the type of measured feedback that keeps them desirable as testers. It’s an approach Wachholz advocates for firms that do not have ready access to a large sampling of users.
6. Design Open-Ended UAT Focusing on Product Goals
The best time to conduct user acceptance testing tends to be late in the software development cycle, when you have a prototype but haven’t yet sunk resources into making it functional and scalable, Grouverman said.
While, in practice, testing tends to be a binary process, Grouverman said a more effective approach is to ensure all acceptance criteria lead to feasible possibilities. Unlike Wachholz, she doesn’t consider user acceptance testing a purely binary exercise.
“I would say the first principle is you have to have results that are going to be usable and, in an ideal scenario, you should not be doing acceptance testing to answer a binary ‘yes’ or ‘no’ question,” she said. “Instead, you should be doing it in order to give direction one way or another.”
Whether or not UAT is “binary,” however, may be a matter of semantics. Wachholz points out that user acceptance testing often is applied to evaluate whether a new feature “moves the needle.” For instance, is the experience improved because it takes a user less time to complete a task?
This is what happened when Designing4UX was hired to conduct UAT testing on an app for real estate agents to record details of their client interactions.
“I’ll ask participants to ‘think aloud’ or ‘vocalize their trains of thought’ when they are performing a task.”
“It sounds counterintuitive, but we wanted to lower time in app and see an increase in usage,” Wachholz said. “Up-front investment in the app on the part of a new user was high (i.e. a barrier to entry). Our goal was to make it super efficient on first sign-in, so [users] spent less time in the app, but the value was significantly higher.”
Results confirmed what Designing4UX hypothesized — users spent more time on the app when log-in time was reduced. It wasn’t a binary “yes” or “no,” more like an assessment of “good” versus “better.”
This kind of outcome can be generated from an open-ended, Socratic questioning approach. During user acceptance testing, users are typically given a script and asked to perform certain tasks. Observers might assess where they go in a flow, how much time they spend there and whether they select certain buttons or tabs.
“The idea is to not create bias or direct the subject in a particular way,” Wachholz said. “I’ll ask participants to ‘think aloud’ or ‘vocalize their trains of thought’ when they are performing a task. This way, I get more information about what they are thinking, versus silence and movement on a screen.”
Or, as Falkman put it: “Just saying, ‘Here’s where you are, and here’s where you want to finish,’ but not giving any instruction in terms of how to get there is ideal, because you want users to have their own struggles. Sometimes they’ll be like, ‘Well, should I click this button?’ And my answer is, ‘What do you think will happen when you click that button?’”
7. Evaluate User Acceptance Testing Results and Develop Next Steps
At the most basic level, UAT should measure the success or failure of key criteria. Hilos’ six-month testing process involved more than 100 test subjects and included surveys, interviews and social and site engagement metrics. It tallied results in a master spreadsheet. A clear timetable kept the project on track.
“We had very clear deadlines for when we wanted to release the product and why. What will it take to have 95 percent confidence in a positive reaction from these core demographics?” he said. “Setting those goals initially is important, both from a timeline perspective, as well as ‘What is the acceptance threshold you want among key groups?’”
Features are typically built around key performance indicators and business goals. If a feature fails to meet user acceptance criteria during UAT testing, the team needs to review the results and decide what needs to change. Sometimes, it’s a simple UI design tweak, while, in other cases, users flat out reject an entire feature. In the latter case, it’s important to set up a subsequent test to root out deeper issues.
If a feature passes a user acceptance test, it can be put on a release schedule.
“You typically don’t release a feature to 100 percent of the user base immediately. There are too many things that can go wrong — even with all your testing,” Wachholz said. “You might begin with 20 percent one week, 40 percent the next week, and so on.”
“What will it take to have 95-percent confidence in a positive reaction from these core demographics?”
It’s important to keep in mind that results of UAT tend to be qualitative and often need to be substantiated with quantitative testing during a beta release, Grouverman said.
Tools like hotjar or FullStory record user interactions in a beta environment and validate the findings of user acceptance testing. As usability consultant Jeff Sauro, founding principal of Measuring U, wrote on a company blog: “You do not need a sample size in the hundreds or thousands, or even above 30, to use statistics. We regularly compute statistics on small sample sizes (less than 15) and find statistical differences.”
While pointing out that peer-reviewed journals usually deem a result statistically significant based on a p-value less than .5 — meaning there is less than a 5 percent probability an observed result was due to chance — that level of confidence is rarely needed to launch a release after UAT.
The bottom line is: The greater the sample size, the higher the probability the results will be reliable. But even if you do not have the time or budget to pull together an ideal sample of users, user acceptance testing can be valuable as a barometer of a feature’s appeal.
“Software companies are always launching new features. It requires ongoing acceptance testing — which, I believe, should come earlier on,” Stahl said. “A lot of companies get calls from customer support. The more they get asked for something, the higher it goes on the list. And then they pump them out. But they don’t actually do a lot of testing; it’s not baked into the R&D process. It’s a very rudimentary kind of upvote or downvote.”
That, according to Stahl, is a mistake.