Surprising Split-Testing Results…

July 12, 2009

Marketing, PPC

Split Testing Graph

Split Test for Maximum Profits

Actually, let me say that I’m going to betray the title of this post right away and say that to me these results aren’t surprising at all. However, if you read some of the info by the “experts” then stuff like this isn’t supposed to ever happen.

The reality is that I’ve tested thousands of actions over millions of visitors and the only constant is that sometimes really strange things happen when you split test. I suppose in a perfect world we could look at the effect of everything from what’s on TV to the tidal forces of the moon when analyzing data…but for now, all we get is what our split testing software tells us.

One thing is for sure- most people don’t let their tests run long enough to come to any kind of meaningful conclusion.

I recently read a product on split-testing where the author declared one version a “decided winner” after it completed only 7 actions because the next closest result was only 5. Wow. Really? (sadly, this product cost hundreds of dollars…this is what the “pros” are selling to unknowing beginners)

In the example I’m about to show you it will become clear that even after 100 actions things can change.

I recently ran a new split-test campaign, below are some “snapshots” throughout its run. The columns below represent three different landing pages, left to right. The number represent sales or “actions”.

A    B    C

03  06  07
20  27  29

At this point nothing exceptional has happened. Page C is in the lead but not with a huge margin. In comparison to A though, C is showing strong. Carrying on…

A    B    C

50  62  67

I particularly like the above numbers. They show a clear leap by page C, especially over A. 67 compared to 50 is definitely statistically significant and would make me feel safe choosing C as the winner. At this point 90% of people would remove A and either start a new test between B and C or declare C the winner and move on to another test. This would not be bad or wrong…but let’s see what happens.

A    B    C

84  89  95
100 105 110
124 128 134

What do we have here? All the pages are keeping their respective positions but page A is making a comeback. Those last numbers still favor page C but nowhere near as strong as before. The final numbers are even more surprising…

A    B    C

150 152 154
162 162 162

Wow! What the hell happened?

What happened was that after about 10,000 views each page is converting at EXACTLY the same, or 1.62%

I see this kind of thing all the time but the “experts” are calling tests done at only 7 actions. What does this mean? Well, it means that statistically there isn’t probably much difference between any of these pages. Strange, huh? If we run tests long enough it can be interesting to see the gaps close.

Based on the above, if I had to I would still choose page C over the others. It enjoyed a steady lead over the other pages throughout the test even though it was caught in the end. I would bet that if I ran the test longer that C might regain its lead. Then again, I’ve seen results like this where page A would take off and become the leader. In short, this test really just shows how strange results can be.

If in doubt, test more but more also test longer. Don’t be anxious to close a test quickly just because it’s giving you a result you like. True split-testing requires a scientific approach.

What do you think?

Update: Page A is now the winner with 176 actions. Pages B and C sit at 173 and 174 respectively.

Update 2: This is my last update as I will now be moving on to testing other things. As I mentioned, you can never be too sure…final numbers below:

A    B    C

188 186 184

Related Posts:

  • No Related Posts

Subscribe

Subscribe & Follow Us:

Post comment as twitter logo facebook logo
Sort: Newest | Oldest

Did all visitors come from the same traffic source or did they go through the same funnel before splitting the traffic to different pages?
I had results screw up once when one of the pages got indexed by Google and attracted more views that the others and therefore also got more actions.

I go along with Leo from above, who said:"it really depends on what you are testing". When you do subtle changes, you will probably find no big differences in the long run (like you experienced), but drastic changes will show significant results and point you into directions for changes more easily.

"declaring winners with just a handful of conversions"
Somewhere you have to start, and not everyone gets 10,000 visitors over a short period of time.

Great split test.

You're right about letting split tests run longer than what is usually advised by "gurus". It's just statistics. Everything has it's swings even your worst sales page. You learn a lot about what makes the best page by taking notice to your test results.

It's been the stupidest thing like the color of my buy now button that caused the biggest leaps in conversions.

Hi Leo,

You are correct that major features like the headline or the offer typically have the highest potential of affecting conversion rates when changed.

If you look at the second test listed above Page A was at 50 and Page C was at 67. That's a HUGE difference. Whether the element being tested was a headline or a font style...that's significant. I would feel good about choosing Page C at this point.

However, we can see that hundreds of actions later...Page A was actually in the lead.

That was really what I wanted to emphasize with this post...most people simply do not run tests long enough to make certain their results are reliable.

I see "gurus" (I hate that word) declaring winners with just a handful of conversions...and that's just stupid.

I just wanted to illustrate that sometimes it's very interesting to see what happens when we let split tests run longer than normal.

Thanks for the input!

-Luke

Interesting numbers....I think that it really depends on what you are testing. For instance, if you are testing fonts, color schemes or some other MINOR design aspect, I imagine that you will find that numbers will be very close (like what you show above).

However, I have found huge differences in conversions or actionable results if you test other things like headlines, adding video vs. straight text, proof placement, or even where you place the action link (above the fold, at the end...in the middle)...

I would consider testing that is that close to be almost negligible though...there really is no clear cut winner if you factor in a buffer.....

.02% difference with only 10k page views isn't a lot.

I've used a bunch, my favorite right now is http://powersplittester.com/

It's cheap, simple and effective.

What software do you use to perform your split tests?