By - 03/29/2021
By November 5, 2020, as election results continued to be tallied, it was becoming certain that Joe Biden would go on to become the winner of the 2020 US presidential election. But the outcome was far more nail-biting than expected. After all, surveys projected him to win by 8% points nationally, high single digit margins in all swing states, including by 8.4% in Wisconsin 7.9%, in Michigan, and 2.5% in Florida.
Biden’s presumed strength was such that even when gold-rated pollster Anne Selzer came out with a +7 Trump rating in Iowa, she was loudly panned — NYTimes’ Nate Cohn commented “Selzer can be wrong, and has been before,” with other commentators piling on as well.
What happened of course was different. Trump outperformed polling by 6% in Florida, 7.7% in Wisconsin, and 7.3% in Ohio. Seltzer of Iowa ended up right, when Trump beat the Iowa consensus polling by 6.9%.
What’s shocking in these results isn’t that polls can be off — that happens. What’s shocking is the breadth of the error, across presidential, senatorial, and house levels; across geographies; across pollsters. This miss was in the same direction, of the same intensity, as the miss in 2016, despite all the hand-wringing and adjustments and model updates that happened in the interim.
How did the most generously-funded, highest-quality, and highest-frequency election polling in history get it wrong? While the eulogy is still being written, many theories point to a fundamental problem — a response sample that turned out to be non-random.
For anyone who has taken a beginner’s statistics course, the foundation of all calculations rely on random sampling. Confidence intervals and margins of error are meaningless if your sample isn’t random.
While we’d like to believe polls are random samples — they are not. Non-responsive bias and self-selection are two of many ways samples can be improperly collected; a leading theory is that Trump voter social disaffection led to the polling misses.
We, the voting public, believed that polling firms who were well-funded and staffed to the brim with PhDs, would figure out techniques around this. As is now obvious, that assumption was dead wrong, as wrong in 2020 as it was in 2016.
If the best polling firms in the world can come to the wrong conclusions, what does this mean for user research?
User research often manifests itself as independent functions within larger corporations. Or, it refers to the products of companies like User Leap, User Testing, and others who promote the idea of “continuous” research, promising Web app developers with the data they need to understand how users use their product.
Regardless of the tool, these approaches have the same fault at their core — the lack of a random sample. Most users dismiss survey prompts on web sites. Most users will never fork over our time to conduct an in-person research survey. The people that do respond are self-selecting by definition — maybe they’re overly pissed off about something or maybe they’re bored, but they’re definitely not random.
If the data you’re using isn’t randomly sampled, your conclusions are going to be invalid. And that can be costly — for your time, your team, and your business.
User research purports to answer “why” people behave in the ways that you see them behave in your analytics. But as we know, the whys are based on non-random samples and therefore are unsound.
The good news is, analytics tools like Google Analytics (or Fathom for the privacy-minded) track nearly all users, so there’s no need to randomly sample (and when they do, it’s a real random sample). They also go a long way to answer the “whys.” Sometimes metrics differ by desktop or mobile — which points to device type as influential. Sometimes metrics differ by browser, which could point to specific browser bugs. With the right cut of the data, an analytics tool can often answer the why through data alone. In our case, to develop our product roadmap, we used search trends and our internal analytics to get insights that solitaire players also played Mahjong, Hearts, FreeCell, and Spider Solitaire.
Going further, tools like HotJar record randomly-sampled screens and create heatmaps, so you can actually see what a random sample of users does on your site, which arguably is multiples more valuable than getting a response from a self-selected user or someone in a user research lab. (Note, the privacy-minded out there may not like these sorts of recording software tools — which is a worthy debate to have).
Ultimately, what someone may say in a survey or explain in a lab will differ from how they’ll react in real life. The only way to know if something is working is to try it. There are many low-cost ways to try new things, including a/b testing and painted door testing (link to other post), that will minimize the time it takes for you to learn about user behavior.
The benefits of experimentation go beyond simply user-testing. Building an organization that can execute tests quickly means that you can iterate faster. Waiting for user research feedback can often have the opposite effect — adding an additional bottleneck into an often already complex software development process. Innovating early and often is good.
Some sites, like Stitchfix, don’t guess what their users want — they just ask. Integrating a survey into the actual user experience, which more or less forces users to respond and reduces bias, is another way of learning a user’s intent.
One of the few times when user research is beneficial is when you don’t need a random sample — aka, you have big, enterprise customers that account for a majority of your revenue.
Another area where user research can be important is when you have zero product and are only exploring a market. Here, however, the most important part is building relationships with potential customers — user research here is less about actual product workflows and more about building the customer relationships you need to get started.
Most importantly, user research is not just a standard stage in the product development process, where you are interviewing individual customers and relying on non-random feedback. User research should be thought about more holistically, encompassing all the tools that you have, and relying on the ones that can give you a more complete picture of user behavior — that includes analytics, experimentation, and data-yielding product workflows. Perhaps Google’s embrace of quantitative user research will move product development further in this direction.