A Primer on A/B Testing

by Lara Swanson

12 Reader Comments

Back to the Article
  1. Although A/B testing is arguably simpler and easier to grasp than multivariate testing (MVT) I’m curious why you would advocate such a home-grown solution when using Google Optimizer is better equipped, will handle the statistical analysis for you, automatically hooks into Google Analytics for your outcome tracking, and it would allow you to graduate from simple A/B testing to simultaneously handling multiple hypotheses with almost negligible sample requirement cost. (The advantage to MVT being that if you can think of 4 or 5 test ideas, whereas one A/B may be disappointing and fail to “move the needle”, here the chances that one of your ideas will be significant is much higher.)
    Copy & paste the code below to embed this comment.
  2. @murraytodd - excellent question. I tested Google Optimizer about eight months ago, and found that the amount of page load time that it added for the test counteracted the pros of the tool for me. It may be a great tool to help people get started, but if you use it, be sure to check to see how much page load time it’s adding. MVT is great for developers who are comfortable with A/B testing. It can add more unknowns, though, depending on the test. If you’re comfortable finding statistical significance in a MVT, that’s great! If someone is just getting started with MVT, one really stellar tool for finding winners of a multivariate test is the “Ad Comparator”:http://adcomparator.com/ . You can use it for any type of conversion, not just ads.
    Copy & paste the code below to embed this comment.
  3. I see you used the Taguchi methode to test this, did you hardcode all these changes? I think the best part of this article is that you are clear the waiting forever on a results and kill the test. At the same time realise the best results come from trying some aggressive changes like your text-change. Two excellent lessons for anyone trying to understand A/B testing
    Copy & paste the code below to embed this comment.
  4. @Reedge yes, I coded this changes instead of using a third-party tool to serve the versions. Handcoding the tests saved quite a bit of page load time for me, but it may be easier for others to get started with A/B testing using a third-party tool.
    Copy & paste the code below to embed this comment.
  5. Hi Lara, Indeed if you know how, its always good to save a couple of ms on load time. Nice link to the taguchi methode I’ll bookmark it, its interesting to add that to Reedge as alternative to what we got now. Regards, Dennis
    Copy & paste the code below to embed this comment.
  6. I’ve been thinking of trying A/B testing, but struggle with defining goals to test and ways to measure success. Goals are the first problem. The examples given in the article revolve around persuading the user to take some concrete action: click this button, sign up for that newsletter, buy that thingy. I work in an academic library. At base, we want our users to find reputable academic information that supports their research. The problem is that it’s very difficult to reduce that to simple, concrete actions.  Although there are definite actions that go into the research process, deciding which of them to try (and in what order) is heavily contingent on the topic and purpose of the research. The approach that works well for an undergraduate writing a five page paper will be inadequate for a graduate student assembling a hundred-page annotated bibliography. But in most cases, the site traffic is anonymous - we have no way to distinguish freshmen from faculty, which makes it hard to come up with test goals that make sense. The other problem is measuring success.  Ideally, we could tell whether users found useful information based on whether they make use of it.  Did they check out that book?  Did they cite that article?  But we mostly have no way of tracing the user in such detail.  If Susy Q. Student looks up books in the catalog and then borrows one, we have no way of connecting the search she did with the checkout.  We can reproduce her search easily enough, but without knowing the research question that brought her to the site, it’s hard to assess whether the results met her needs or not. How would you go about designing an A/B test in support of a more abstract goal like this? Or would you use some technique other than A/B testing for approaching this problem? I’d be interested to hear any comments.
    Copy & paste the code below to embed this comment.
  7. @Will - really thoughtful question. It sounds like the first thing you may need to do is set up a better analytics solution. Are you able to figure out the total books checked out in a certain time period, number of users on your system, and number of users who have books checked out? If you’re currently not able to track basic user workflows and these metrics, then it’ll be really difficult to do any A/B testing, since you won’t have a baseline or a way to measure success. Once you do have a better data collection solution, then you can start looking at the problems you’re looking to solve. Are there students that do tons of searches but never check anything out? You could test different tweaks to the search results pages to see what helps people find what they’re looking for. Or, do lots of students log on but never visit helpful parts of your system outside of search? You can A/B test ways to better highlight the different useful tools you offer. Note that most basic solutions, like Google Analytics, will tell you what brought your users to your site (search, referring link, search terms, etc.). You can also set cookies or another method of tracking your users between search and checkout. The key here is that you need more data - I hope that helps!
    Copy & paste the code below to embed this comment.
  8. The other problem is measuring success. Ideally, we could tell whether users found useful information based on whether they make use of it. Did they check out that book? Did they cite that article? But we mostly have no way of tracing the user in such detail. If Susy Q. Student looks up books in the catalog and then borrows one, we have no way of connecting the search she did with the checkout. We can reproduce her search easily enough, but without knowing the research question that brought her to the site, it’s hard to assess whether the results met her needs or not.
    “Buy Articles with Publication”:http://local-impact.org
    Copy & paste the code below to embed this comment.
  9. Tracking user workflows is fiendishly difficult.  Like most academic libraries, most of our site is actually just a connection point with third party services.  The following are run by third parties: * Our catalog (which we share with fifty or so other libraries)
    * Our databases of articles (about 300 of these, from a few dozen vendors)
    * The link resolver (which checks whether a given article is available in the databases) In all of these cases, we have little or no control over the UI that is presented to the user. Once the user has initiated a search in our holdings, they are to all intents and purposes no longer on “our” site even though it’s our data they’re searching.  And one user on a moderately intense research session could very easily hit the catalog, three different article databases, and the link resolver, resulting in usage data which is split across multiple silos. The catalog is particularly vexing, because even if we did manage to get analytics out of it, our traffic would be all mixed up with the traffic from every other library in the consortium. Most of these third party vendors (Ebsco, ProQuest, Elsevier to name a few big ones) can provide usage statistics; but these are mostly pre-made reports rather than raw data, they all report slightly different things, and it’s hard to tell whether the stats from vendor A are comparable with those from vendor B. We’ve had Google Analytics installed and running for years. Some of the data it provides is very useful.  But that data has distinct limits.  68% of our visitors hit the home page and immediately depart for a third party site.  I’ve put in some code to track *where* they go, but I cannot track what they do or where they go on a third-party site. The more I think about it, the more I think that I really need to do some traditional usability testing.  Under those kind of controlled circumstances I can at least sit at their elbow and watch what they do. Maybe I could use A/B testing for some more fine grained stuff which has to do with our own site, for example labeling choices.  Hmm.  Have to put some more thought into that.
    Copy & paste the code below to embed this comment.
  10. I’ve noticed a difference in the way Hubspot and Google Optimizer run A/B tests, which leads me to a question about A/B testing in general. It doesn’t look like Hubspot plants a cookie in the user’s browser, and so over the course of many visits the user will see both A and B served up randomly. On the other hand, Google Optimizer plants a cookie, and either A *or* B is served up persistently. In other words, if a user sees A once, it’s A for the length of the experiment, no matter how many times he/she visits the page(s) where the test is. My hunch is that Google Optimizer does it the better way. Users should be given one variable—one chance to vote with their click over time— and that’s it. Are both approaches valid? Is one preferred over another?
    Copy & paste the code below to embed this comment.
  11. I prefer persistent A/B versions (the way Google Optimizer runs). Both ways are valid, but when you look at the results of your test, be sure to note which way (persistent or not) the test was run. For example, if I’m testing an account setting, I want it to look the same for a user each time they log in. This will help me measure the effects of the setting and its success. If it changes each time, it may confuse the user, which will add a new variable to your test results (and may invalidate the results, depending upon how you’re measuring success of each version). Hubspot’s way could still be helpful in some cases, but I wouldn’t use it for any A/B test that examines user workflows or other actions that may be repeated by the same user. Hubspot’s way could work for things like sidebar ad text or other content the user may see once - but it really depends on the test.
    Copy & paste the code below to embed this comment.
  12. Hi Lara This is a new area to me.  I have used Adwords for a while but for some reason the concept of testing different scenarios in a systematic way never really clicked with me until last couple of weeks. Now I am on a mission to learn as quickly as possible. Thanks for a great article. Simon
    Copy & paste the code below to embed this comment.