Skip to main content

In performance marketing, progress is built on iteration. It’s not always about reinventing the wheel; it’s about tweaking the right bolt. And much of that comes down to answering the same kinds of questions, over and over again.

  • “How can we improve our creative?”
  • “What can we do to drive more traffic to our ads?”
  • “Why are people clicking, but not converting once they reach the site?”

Sometimes, the answer is obvious. Most of the time, it’s not. That’s where A/B testing comes in. Whether it’s two pieces of ad copy going head-to-head or a broader test around campaign structure, bid strategies, or even language targeting, A/B testing is the difference between acting on instincts and acting on insights.

Many marketers talk about testing, but not nearly enough of them talk about testing well. A/B tests have the potential to unlock insights and drive real, iterative growth—but only if they’re built with structure, patience, and the full picture in mind. 

Bad testing can be worse than no testing at all. If your method isn’t clear, your results won’t be clear. You might stumble into decent performance, but without structure, you’re wasting time, budget, and resources on a process that won’t efficiently scale or drive repeat success.

A/B Testing: Tips for Success

Here are a few simple steps to A/B testing that can help you keep things locked in every time:

1. Decide What To Test

There are many A/B tests you can run, each of them valuable in different ways. Some common examples include: 

  • Creative A vs. creative B
  • Catch-all vs. segmented
  • English vs. Spanish
  • Local vs. national
  • Landing page 1 vs. landing page 2

In our use case, we wanted to test the performance of a generic Performance Max campaign against product category segmented campaigns.

2. Set Your Expectations

Before beginning A/B testing, it helps to have an idea of your expected results. This could be specific, like “CPC in the experiment group will improve versus the control group.” You can also take a more open-ended approach, testing refreshed creative against old creative with the expectation simply being that performance will differ.

3. Build Test Campaigns

For our test, we kept everything in the asset groups the same and did not change our headlines, descriptions, or structure. The only difference was which products were featured in the shopping feed.

Here’s the setup:

Performance Max CampaignIncluded Listing GroupsExcluded Listing Groups
All ProductsAll ProductsSchool SuppliesPrintersOffice Supplies
School SuppliesSchool SuppliesAll Other Products
PrintersPrintersAll Other Products
Office SuppliesOffice SuppliesAll Other Products

Before launching, we also accounted for conversion lag. Our All Products campaign had been live for a while, and based on our performance history, we knew traffic typically took up to three weeks to mature. That meant generating enough spend across that time window before drawing any conclusions.

On the topic of timelines: It’s tempting to compare “before and after,” but too many external factors—seasonality, month-to-month budget changes, and market factors, among others—can skew those results. For a retail client coming off the holiday season, for example, comparing January and February against November and December isn’t a clean test.

4. Know What To Compare

A/B tests are efficient because you can launch the experiment and control groups simultaneously to keep comparisons apples-to-apples. Sloppy A/B tests might give you data, but that data can lead you to make the wrong call. One of the easiest ways to get tripped up is by “declaring” a winner before the numbers are statistically significant. You scale too soon, pour more budget into what looks like a high performer—and find out later it was a fluke.

Let’s Look at the Results

We’ve all been there: a seminar about a new digital marketing “discovery,” or more likely, a sales pitch from a platform claiming to “boost your market share” with AI-enhanced data-driven insights. Sometimes they show flashy stats, but when you dig in, there’s not much substance behind them.

Let’s look at our test results the way I’ve seen them presented in some of these pitches:

Test Results That Tell You Nothing
CampaignCPCCPACTRConv. RateROAS
Office Supplies$1.05$181.07%2.15%$3.08
School Supplies$1.32$234.39%3.09%$2.06
All Products$0.60$231.26%0.47%$2.02
Printers$1.67$250.73%0.7%$1.65

Here’s the pitch: “Office Supplies outperformed our control group ROAS by +52%! CPA was 23% cheaper!”

Here’s what’s missing: These results didn’t happen in a vacuum. It’s great that we saw growth from breaking out campaigns, but we haven’t answered the question of how much growth did we really see?

Let’s add the context: How much spend did each of these campaigns actually get?

Results That Show the Bigger Picture
CampaignSpend %Impression ShareCPCCPACTRConv. RateROAS
All Products50%21%$0.60$231.26%0.47%$2.02
School Supplies27%19%$1.32$234.39%3.09%$2.06
Printers20%<10%$1.67$250.73%0.7%$1.65
Office Supplies2%<10%$1.05$181.07%2.15%$3.08


Yes, Office Supplies performed well when broken out, and that’s good to know. But now that we see it only made up 2% of total spend, it’s clear this test doesn’t prove Office Supplies–only campaigns are a viable replacement for the broader, best-practice effort. It’s not honest to imply that they are the be-all and the end-all winner of the test. 

This kind of flawed interpretation is common. Take an ad copy test where Title Case outperformed Sentence case by 85% ROAS, but the sample size was on-eighth as large. This is more a result of scale than of undiscovered strategies. 

What Can This Teach Us?

Our test showed that breaking out product-specific campaigns can provide helpful insights and boost performance, but only when those segments are big enough to move the needle.

The real winner? School Supplies. It beat our All Products ROAS and pulled in a strong 19% impression share, something the other segmented campaigns couldn’t match.

At the end of the day, a test is only as strong as the context you give it.

If you’re not careful, you’ll end up optimizing for the wrong outcome, or worse, cutting your most important campaign because a tiny test “looked better.”

The lesson: insight beats instinct. Context beats clever stats.

Things To Keep Consistent in Your A/B Test

  • Budget pacing across variants
  • Length of time each test runs
  • Exclusion logic or negative keywords

And if you are changing any of these elements as part of your test, make sure all others are consistent:

Using Google Ads Experiments for A/B Testing

Google Ads offers a Custom Experiments tool that enables A/B testing for campaigns, allowing you to:

  • Select which campaigns to test
  • Define success metrics
  • Allocate budget between test and control groups

While the tool is user-friendly and efficient, it comes with limitations. At the time of writing, this tool is only available for Search and Display campaigns, restricting broader testing capabilities across other campaign types.

Other Campaign Type Limitations

  • Performance Max: Testing is limited to comparing Google’s auto-generated assets against your current setup, or testing a Performance Max campaign against a non-Performance Max equivalent. These tests feel more promotional than practical, with limited value for nuanced experimentation.
  • Demand Gen: Supports basic A/B testing between two campaigns with budget splits
  • Video Campaigns: Similar to Demand Gen with additional customization options, like varying more elements across campaigns. Still limited in scope.

Search Campaign Options & Limitations

For Search campaigns using Responsive Search Ads (RSAs), the Experiments tool does offer some real utility:

  • Custom Experiments: You can define the test duration, success KPIs, and allocate spend percentages. It also allows for pre-launch changes like keyword adjustments or location targeting.
  • Ad Variations: Ideal for quick headline tests using a “find and replace” interface. While this could be done manually, the cleaner UI speeds things up.

Here are some other factors to keep in mind:

  • Experiments are capped at 85 days in duration.
  • Manual setups offer more customization but require more time and effort.
  • Using the Experiments Dashboard trades flexibility for speed and simplicity.

What A/B Testing Can Really Teach You

Sometimes, tests reveal surprise winners. Other times, they confirm that your best-practice setup is working for a reason. In our case, we learned two things:

  1. Segmentation is helpful, but only when the segments are large enough to scale. 
  2. Important context around the experiment, like percent of total spend, is even more important than flashy metrics.

This is what good testing looks like. It’s patient, structured, and grounded in reality.

The next time someone says “just test it,” make sure you’re also asking, “What are we actually trying to learn and how will we know when we’ve learned it?”

Because that’s where the real growth lives.

Want to see how testing can strengthen your SEM campaigns? We’d be happy to dig in and discuss strategies. Reach out to us or connect with us on LinkedIn.