Comments on A Primer on A/B Testing

12 Reader Comments

Back to the Article
  1. Although A/B testing is arguably simpler and easier to grasp than multivariate testing (MVT) I’m curious why you would advocate such a home-grown solution when using Google Optimizer is better equipped, will handle the statistical analysis for you, automatically hooks into Google Analytics for your outcome tracking, and it would allow you to graduate from simple A/B testing to simultaneously handling multiple hypotheses with almost negligible sample requirement cost.

    (The advantage to MVT being that if you can think of 4 or 5 test ideas, whereas one A/B may be disappointing and fail to “move the needle”, here the chances that one of your ideas will be significant is much higher.)

    Copy & paste the code below to embed this comment.
  2. I see you used the Taguchi methode to test this, did you hardcode all these changes?

    I think the best part of this article is that you are clear the waiting forever on a results and kill the test. At the same time realise the best results come from trying some aggressive changes like your text-change.

    Two excellent lessons for anyone trying to understand A/B testing

    Copy & paste the code below to embed this comment.
  3. Hi Lara,

    Indeed if you know how, its always good to save a couple of ms on load time. Nice link to the taguchi methode I’ll bookmark it, its interesting to add that to Reedge as alternative to what we got now.

    Regards,

    Dennis

    Copy & paste the code below to embed this comment.
  4. I’ve been thinking of trying A/B testing, but struggle with defining goals to test and ways to measure success.

    Goals are the first problem. The examples given in the article revolve around persuading the user to take some concrete action: click this button, sign up for that newsletter, buy that thingy.

    I work in an academic library. At base, we want our users to find reputable academic information that supports their research. The problem is that it’s very difficult to reduce that to simple, concrete actions.  Although there are definite actions that go into the research process, deciding which of them to try (and in what order) is heavily contingent on the topic and purpose of the research. The approach that works well for an undergraduate writing a five page paper will be inadequate for a graduate student assembling a hundred-page annotated bibliography. But in most cases, the site traffic is anonymous - we have no way to distinguish freshmen from faculty, which makes it hard to come up with test goals that make sense.

    The other problem is measuring success.  Ideally, we could tell whether users found useful information based on whether they make use of it.  Did they check out that book?  Did they cite that article?  But we mostly have no way of tracing the user in such detail.  If Susy Q. Student looks up books in the catalog and then borrows one, we have no way of connecting the search she did with the checkout.  We can reproduce her search easily enough, but without knowing the research question that brought her to the site, it’s hard to assess whether the results met her needs or not.

    How would you go about designing an A/B test in support of a more abstract goal like this? Or would you use some technique other than A/B testing for approaching this problem? I’d be interested to hear any comments.

    Copy & paste the code below to embed this comment.
  5. The other problem is measuring success. Ideally, we could tell whether users found useful information based on whether they make use of it. Did they check out that book? Did they cite that article? But we mostly have no way of tracing the user in such detail. If Susy Q. Student looks up books in the catalog and then borrows one, we have no way of connecting the search she did with the checkout. We can reproduce her search easily enough, but without knowing the research question that brought her to the site, it’s hard to assess whether the results met her needs or not.
    “Buy Articles with Publication”:http://local-impact.org

    Copy & paste the code below to embed this comment.
  6. Tracking user workflows is fiendishly difficult.  Like most academic libraries, most of our site is actually just a connection point with third party services.  The following are run by third parties:

    * Our catalog (which we share with fifty or so other libraries)
    * Our databases of articles (about 300 of these, from a few dozen vendors)
    * The link resolver (which checks whether a given article is available in the databases)

    In all of these cases, we have little or no control over the UI that is presented to the user. Once the user has initiated a search in our holdings, they are to all intents and purposes no longer on “our” site even though it’s our data they’re searching.  And one user on a moderately intense research session could very easily hit the catalog, three different article databases, and the link resolver, resulting in usage data which is split across multiple silos.

    The catalog is particularly vexing, because even if we did manage to get analytics out of it, our traffic would be all mixed up with the traffic from every other library in the consortium. Most of these third party vendors (Ebsco, ProQuest, Elsevier to name a few big ones) can provide usage statistics; but these are mostly pre-made reports rather than raw data, they all report slightly different things, and it’s hard to tell whether the stats from vendor A are comparable with those from vendor B.

    We’ve had Google Analytics installed and running for years. Some of the data it provides is very useful.  But that data has distinct limits.  68% of our visitors hit the home page and immediately depart for a third party site.  I’ve put in some code to track *where* they go, but I cannot track what they do or where they go on a third-party site.

    The more I think about it, the more I think that I really need to do some traditional usability testing.  Under those kind of controlled circumstances I can at least sit at their elbow and watch what they do.

    Maybe I could use A/B testing for some more fine grained stuff which has to do with our own site, for example labeling choices.  Hmm.  Have to put some more thought into that.

    Copy & paste the code below to embed this comment.
  7. I’ve noticed a difference in the way Hubspot and Google Optimizer run A/B tests, which leads me to a question about A/B testing in general. It doesn’t look like Hubspot plants a cookie in the user’s browser, and so over the course of many visits the user will see both A and B served up randomly. On the other hand, Google Optimizer plants a cookie, and either A *or* B is served up persistently. In other words, if a user sees A once, it’s A for the length of the experiment, no matter how many times he/she visits the page(s) where the test is.

    My hunch is that Google Optimizer does it the better way. Users should be given one variable—one chance to vote with their click over time— and that’s it. Are both approaches valid? Is one preferred over another?

    Copy & paste the code below to embed this comment.
  8. Hi Lara

    This is a new area to me.  I have used Adwords for a while but for some reason the concept of testing different scenarios in a systematic way never really clicked with me until last couple of weeks.

    Now I am on a mission to learn as quickly as possible.

    Thanks for a great article.

    Simon

    Copy & paste the code below to embed this comment.
  9. Sorry, commenting is closed on this article.