Surprising Split-Testing Results…

Actually, let me say that I’m going to betray the title of this post right away and say that to me these results aren’t surprising at all. However, if you read some of the info by the “experts” then stuff like this isn’t supposed to ever happen.

The reality is that I’ve tested thousands of actions over millions of visitors and the only constant is that sometimes really strange things happen when you split test. I suppose in a perfect world we could look at the effect of everything from what’s on TV to the tidal forces of the moon when analyzing data…but for now, all we get is what our split testing software tells us.

One thing is for sure- most people don’t let their tests run long enough to come to any kind of meaningful conclusion.

I recently read a product on split-testing where the author declared one version a “decided winner” after it completed only 7 actions because the next closest result was only 5. Wow. Really? (sadly, this product cost hundreds of dollars…this is what the “pros” are selling to unknowing beginners)

In the example I’m about to show you it will become clear that even after 100 actions things can change.

I recently ran a new split-test campaign, below are some “snapshots” throughout its run. The columns below represent three different landing pages, left to right. The number represent sales or “actions”.

A    B    C

03  06  07
20  27  29

At this point nothing exceptional has happened. Page C is in the lead but not with a huge margin. In comparison to A though, C is showing strong. Carrying on…

A    B    C

50  62  67

I particularly like the above numbers. They show a clear leap by page C, especially over A. 67 compared to 50 is definitely statistically significant and would make me feel safe choosing C as the winner. At this point 90% of people would remove A and either start a new test between B and C or declare C the winner and move on to another test. This would not be bad or wrong…but let’s see what happens.

A    B    C

84  89  95
100 105 110
124 128 134

What do we have here? All the pages are keeping their respective positions but page A is making a comeback. Those last numbers still favor page C but nowhere near as strong as before. The final numbers are even more surprising…

A    B    C

150 152 154
162 162 162

Wow! What the hell happened?

What happened was that after about 10,000 views each page is converting at EXACTLY the same, or 1.62%

I see this kind of thing all the time but the “experts” are calling tests done at only 7 actions. What does this mean? Well, it means that statistically there isn’t probably much difference between any of these pages. Strange, huh? If we run tests long enough it can be interesting to see the gaps close.

Based on the above, if I had to I would still choose page C over the others. It enjoyed a steady lead over the other pages throughout the test even though it was caught in the end. I would bet that if I ran the test longer that C might regain its lead. Then again, I’ve seen results like this where page A would take off and become the leader. In short, this test really just shows how strange results can be.

If in doubt, test more but more also test longer. Don’t be anxious to close a test quickly just because it’s giving you a result you like. True split-testing requires a scientific approach.

What do you think?

Update: Page A is now the winner with 176 actions. Pages B and C sit at 173 and 174 respectively.

Update 2: This is my last update as I will now be moving on to testing other things. As I mentioned, you can never be too sure…final numbers below:

A    B    C

188 186 184

4 Responses to Surprising Split-Testing Results…
  1. Andrew
    July 22, 2009 | 12:19 am

    What software do you use to perform your split tests?

  2. Luke
    July 23, 2009 | 10:58 am

    I’ve used a bunch, my favorite right now is http://powersplittester.com/

    It’s cheap, simple and effective.

  3. Leo
    July 25, 2009 | 1:22 pm

    Interesting numbers….I think that it really depends on what you are testing. For instance, if you are testing fonts, color schemes or some other MINOR design aspect, I imagine that you will find that numbers will be very close (like what you show above).

    However, I have found huge differences in conversions or actionable results if you test other things like headlines, adding video vs. straight text, proof placement, or even where you place the action link (above the fold, at the end…in the middle)…

    I would consider testing that is that close to be almost negligible though…there really is no clear cut winner if you factor in a buffer…..

    .02% difference with only 10k page views isn’t a lot.

  4. Luke
    July 28, 2009 | 12:07 pm

    Hi Leo,

    You are correct that major features like the headline or the offer typically have the highest potential of affecting conversion rates when changed.

    If you look at the second test listed above Page A was at 50 and Page C was at 67. That’s a HUGE difference. Whether the element being tested was a headline or a font style…that’s significant. I would feel good about choosing Page C at this point.

    However, we can see that hundreds of actions later…Page A was actually in the lead.

    That was really what I wanted to emphasize with this post…most people simply do not run tests long enough to make certain their results are reliable.

    I see “gurus” (I hate that word) declaring winners with just a handful of conversions…and that’s just stupid.

    I just wanted to illustrate that sometimes it’s very interesting to see what happens when we let split tests run longer than normal.

    Thanks for the input!

    -Luke

Leave a Reply


Wanting to leave an <em>phasis on your comment?

Trackback URL http://www.everstatus.com/marketing/surprising-split-testing-results/trackback/