Understanding Simpson's Paradox

Earlier this year, I was working with a NetSuite client, helping them do some basic AI-based analysis on their sales. One of their prompts generates a sales dashboard and presents the data at a management level.

The dashboard said their gross margins were improving. Their margin had climbed from 35% to 40% quarter over quarter throughout 2025.

But the dashboard was wrong.

When we summarized their sales by product category, every single category's margin had declined. Hardware dropped, software dropped, and services dropped.

After digging into the numbers, we found that the overall margin went up only because the revenue mix had shifted toward the highest-margin category. As a result, the aggregate was masking what was really happening.

At the time, I didn't have a name for what we were seeing. I just knew the data was telling us two contradictory things, and only one of them was useful.

Asking the Data Scientist

I had been working on a series of prompts designed to improve the quality of data in an ERP system, and my standard techniques for building those prompts were disappointing. I realized that what I really needed was "someone" (it's still a little odd for me to refer to these AI agents as if they're people!) who could provide and apply statistical methodology, ERP domain knowledge, and the ability to translate technical findings into business impact. That's a rare combination of skills.

But using AI, and applying Cognitive Systems Design (CSD), I was able to "create" that person. It's an expert system that essentially acts as a Senior Data Scientist, with 15 years of experience, a Ph.D. in Statistics, and strong opinions about methodology.

At some point, it occurred to me that I should ask my new Data Scientist about what I was seeing with my client's data. It knew right away what it was.

This is Simpson's Paradox - and it means the mix of your segments shifted, not that every segment actually improved. The aggregate trend is misleading; the segment-level trends are telling you the truth.

Simpson's Paradox occurs when a trend that appears in the overall data reverses when you break it down by a confounding variable (here, your segments). The mechanism is always the same: the relative sizes of the groups changed between periods, and the groups have very different baseline conversion rates.

What Is Simpson's Paradox?

The Data Scientist's explanation was clear, but I wanted to understand it better. Here's a simple example. Imagine you have two sales channels, Online and In-Store:

Channel Q1 Orders Q1 Revenue Q1 AOV Q2 Orders Q2 Revenue Q2 AOV
Online 8,000 $400K $50 2,000 $90K $45
In-Store 2,000 $600K $300 8,000 $2.2M $275
Overall 10,000 $1M $100 10,000 $2.29M $229

Overall average order value (AOV) jumped from $100 to $229, which makes it look like the company is experiencing strong growth.

However, if you look at the channels, you'll see that Online AOV dropped from $50 to $45, and In-Store AOV dropped from $300 to $275. The AOV for both channels decreased. Confusing, eh?

The reason that the overall AOV increased is that the mix of channels shifted, and it shifted significantly. In-Store sales (the high-AOV channel) went from 20% of orders to 80%. The improved AOV is really only reflecting the shift in the volume of orders being made through the channels.

Here's what that looks like visually. Toggle between the "Aggregate View" and the view "By Segment:"

The green line says things are improving. The red lines say they're getting worse, and only the red lines are useful for making decisions.

If you reported to your CFO that "average order value more than doubled," you'd be technically correct, and at the same time completely misleading.

Where to Look for Simpson's Paradox in NetSuite

Once I understood the pattern, I started seeing it in some of the other NetSuite data analyses I've been doing. Three scenarios in particular stand out.

1. Sales Performance by Rep

I was working on a prompt that analyzed quarterly sales performance for a client. The aggregate numbers looked good, with the overall close rate having improved from 22% to 28%. The kind of thing you'd happily present at an all-hands meeting.

But when I broke it down by rep, every single rep's close rate had declined.

After digging in, I found that the top-performing reps (the ones with the highest close rates) had taken on a larger share of the total opportunities. Some of them had inherited accounts from a rep who left. Territory assignments had shifted. Lead routing had changed and was funneling more qualified leads to senior reps.

The mix of who handled how many deals had changed. The aggregate improved because the best closers now represented a larger share of total activity, not because anyone really got better at closing.

If you reward the team based on the aggregate number, you're celebrating a staffing shift, not a performance improvement. And if one of those top reps leaves, the "improved" close rate will collapse overnight with no change in anyone else's behavior.

2. Item Profitability by Product Category

Another client asked me to build a profitability analysis across their product catalog. The overall gross margin had improved from 35% to 40%. The product team was pleased.

Then I sliced by product category. Hardware margins had dropped from 15% to 12%. Software margins had dropped from 72% to 68%. Services margins had dropped from 45% to 41%. Every single category had declined.

The overall improvement? They had sold proportionally more software that quarter. Software's high margin pulled the aggregate up, even though its own margin got worse.

I've found that this is common in NetSuite environments where companies sell across categories with very different cost structures. A shift in product mix between quarters can mask real margin erosion in every category.

One way to guard against this is to report margins at the category level, not just the aggregate. And when you do report an aggregate, hold the product mix constant. Compare this quarter's category-level margins to last quarter's, using the same revenue weights. In statistics, this is called a standardized rate. In plain language, it answers the question: "If our product mix hadn't changed, would margins still have improved?"

3. Revenue by Subsidiary or Location

I was doing some analysis for a multi-subsidiary NetSuite client. Their consolidated revenue per customer had grown 18% year-over-year. It looked strong in the board deck.

But when I broke it down by subsidiary, each one's revenue per customer had actually declined.

What had happened? Their fastest-growing subsidiary served enterprise customers with much higher average spend. That subsidiary had gone from 10% of their customer base to 35%. The high-value subsidiary was now dominating the average, pulling the consolidated number up while every individual subsidiary was trending down.

I've found that this happens frequently in NetSuite environments with subsidiaries in different markets or geographies. A US subsidiary with $50K average deal size and an APAC subsidiary with $8K average deal size will produce misleading consolidated metrics whenever their relative customer counts shift.

Imagine presenting an 18% improvement to investors. Six months later, the fast-growing subsidiary's growth slows, the mix rebalances, and the consolidated metric drops. Nothing changed operationally. The paradox just unwound itself.

Identifying Simpson's Paradox in NetSuite

It is possible to use saved searches to identify Simpson's Paradox in your NetSuite reports. Here's how you might do it using the product profitability example.

Search 1: The Aggregate View

1. Create a Transaction saved search with a date filter for the two periods you're comparing (e.g., Q1 vs. Q2).
2. Add summary columns: SUM(Amount), SUM(Cost), and a formula column for margin: ROUND((SUM({amount}) - SUM({costestimate})) / NULLIF(SUM({amount}), 0) * 100, 1)
3. Group by period (use the Period field or a formula to label Q1 vs. Q2).
4. This gives you the aggregate margin per period. Note the trend.

Search 2: The Segment View

5. Duplicate the search. Add Item : Category (or Item Group, or Class, depending on your setup) as an additional group-by column.
6. Now you see the margin per category, per period.
7. Compare. Does every category's margin move in the same direction as the aggregate? If yes, then the trend is real. If the category margins all declined while the aggregate improved, then you've found a composition effect.
8. To confirm, add a SUM(Quantity) or COUNT column and check whether the revenue share between categories shifted between periods. That's your composition variable.

The same logic applies to any metric you want to validate.

Why AI Makes This Harder to Catch

We're moving fast toward AI-generated analysis of NetSuite data. Automated reports. AI-powered dashboards. Prompts that ask Claude or GPT to "analyze our sales performance" and return polished summaries. That's exactly the kind of setup my client was using when we found the paradox in their data.

And AI, by default, will summarize at the aggregate level. It will tell you overall revenue is up, overall margins improved, overall close rates are trending well. It will do this confidently and clearly.

Unless you explicitly instruct it to check for composition effects, it won't. It doesn't know that your product mix shifted or that your subsidiary growth rates are uneven. It sees the numbers, reports the trend, and moves on.

This is one of the reasons I'm glad I built the Data Scientist prompt the way I did. It has an explicit reasoning step: Think causally, not just correlationally. And it has a hard constraint: Always question the data-generating process. Without those instructions, it would have reported the same misleading aggregate trend that my client's dashboard did. Most AI prompts will.

A Prompt You Can Use

If you're using AI to analyze NetSuite data, add this to your prompt. One paragraph. It won't catch everything, but it will catch the most common cases.

Copy-paste prompt addition Before reporting any aggregate trend (revenue growth, margin change, close rate improvement, average deal size shift, etc.), verify whether the trend holds at the segment level. Break the data down by the most relevant dimension: product category, sales rep, subsidiary, customer segment, or location. If the segment-level trends move in the opposite direction of the aggregate, flag it clearly. Report both views: the aggregate number AND the segment-level breakdown. Explain that the discrepancy is caused by a shift in the composition of the underlying population (which segments grew or shrank as a share of the total). Do not present the aggregate trend alone when the segments contradict it.

Adding this paragraph to a prompt changes the way the AI reads every dataset you give it.

How to Protect Yourself

Guarding against Simpson's Paradox comes down to building one habit: before you accept any aggregate trend as real, break it down by the most obvious dimension. Product category. Location. Sales rep. Customer segment. If the trend holds at the segment level, then it's real. If it reverses, then you've found a composition effect.

It also helps to report the composition of the population alongside the KPI itself. Did the share of enterprise vs. SMB customers change? Did one product line grow faster than another? The mix is often more informative than the metric.

When comparing periods, hold the subgroup weights constant. NetSuite saved searches and SuiteAnalytics workbooks can support this. The question to ask: "If the mix hadn't changed, would the number still have improved?"

And if you're using AI for analysis, then use the prompt addition above, or build the habit into your AI workflows by always requesting both the aggregate and the segment-level view in the same output.

When It's Not Simpson's Paradox

Not every discrepancy between aggregate and segment-level data is Simpson's Paradox. Sometimes segments just move in different directions, and that's normal. One product line grows while another shrinks. One subsidiary outperforms while another lags. That's just business.

Simpson's Paradox has a specific signature:

  • Every segment moves in one direction (all up or all down)
  • The aggregate moves in the opposite direction
  • The cause is a composition shift in the relative sizes of the segments

If your segments are mixed (some up, some down), and the aggregate reflects a weighted average, then that's just aggregation working as expected. No paradox, and no alarm.

The paradox matters when the unanimous signal from every segment is being overruled by the aggregate. That's when the aggregate is actively misleading you.

Did the relative sizes of your segments change significantly between periods? If not (same product mix, same rep allocation, same subsidiary proportions), then the aggregate trend is probably trustworthy. Simpson's Paradox needs a composition shift to operate. Stable proportions make it impossible.

Wrapping Up

Simpson's Paradox is a reminder that aggregate data can invert reality. The overall number can point in the exact opposite direction of every underlying trend, and it does this without any error message or warning flag.

The only defense is to get into the habit of looking one level deeper. In NetSuite terms: don't stop at the saved search summary. Click into the segments and check the mix. Ask whether the story changes when you slice it differently.

It usually doesn't. But when it does, you'll be glad you checked.