MasterDexter

88% of AI pilots never make it to production.

That number comes from McKinsey's 2025 State of AI report. It is not a bug in the technology. It is a leadership failure.

The graveyard is full of projects that had executive sponsors, engineering talent, vendor relationships, and quarterly budget. They had everything except one thing: a clear answer to the question "what business metric does this move, and by how much?"

I spent the last 18 months working with enterprise teams across insurance, FMCG, and financial services. The pattern is identical in every organization. Not the technology. The decision-making process.

Here is what I have learned about how the 12% think differently.

The hidden cost most leaders never calculate

Before you can fix your pilot portfolio, you need to see what it actually costs.

Most finance teams track compute costs, vendor licenses, and contractor fees. They do not track the most expensive line item: engineering time.

When I worked with a mid-size insurance company to audit their AI portfolio, the finance team believed they were spending around $800,000 annually on AI. The actual loaded cost was $2.54 million. A 4.1x multiplier, entirely from engineering hours that were distributed across IT, product, and department budgets with no single owner.

The formula is simple:

True Annual AI Spend =
  Compute Costs
  + (Engineering Hours x Fully Loaded Rate)
  + Vendor Fees
  + Contractor Fees

Industry data suggests the multiplier is 3x to 5x in most enterprises. If your finance team believes you are spending $500,000 on AI, your actual cost is likely between $1.5 million and $2.5 million.

Run that calculation before your next budget conversation. The number tends to focus minds.

The Value-Complexity Matrix

Once you know what you are actually spending, the next step is brutal honesty about what you are getting.

Forget scoring systems with 47 criteria. The leaders who actually kill bad projects use a two-dimensional matrix: Business Value versus Implementation Complexity. Each pilot gets a score of 1 to 5 on both dimensions. The weighted formula is:

Priority Score = (Business Value Score x 2) + (Complexity Score x 1)

The 2x weight on Business Value is intentional. It stops technically elegant projects with no measurable business outcome from surviving the cut.

Business Value scores a 5 when it directly and verifiably moves one of your top five business metrics. It scores a 1 when the team cannot articulate the dollar impact in one sentence.

That last criterion is the most important gate. In the insurance company audit, eight project owners could not answer "what specific business metric does this move, and what is the current baseline?" Those projects scored 1 or 2 on Business Value regardless of their technical merit, and they were killed.

What the matrix produces:

Score 8 to 10: Scale immediately. These are in production or should be. Score 5 to 7: Pivot or delay. Real value, blocked by a specific fixable problem. Score 1 to 4: Kill. Stop work today and redeploy the engineering hours.

The North Star test

Before any pilot survives the matrix, it must pass one more test. The project owner must complete this sentence in one breath:

"This system will move [specific metric] from [current baseline] to [target] by [specific date], verified by [non-engineering owner]."

If they cannot complete that sentence, the pilot is not funded. Not because it is a bad idea. Because it has not been designed to produce a measurable outcome.

The "verified by non-engineering owner" clause is the part most companies skip. It should be the CFO's office, or at minimum a business leader who owns the P&L associated with the metric. When the board asks whether the AI investment is paying off, CFO-verified metrics are the ones that end the conversation in your favor rather than starting an awkward 20-minute discussion.

The political problem nobody talks about

Here is what the standard frameworks do not cover: the hardest kills are not the technically weak projects. They are the projects with the most senior sponsors.

In the insurance company case, the chatbot product ("Meri") had consumed $340,000 and never had a single live user interaction. It was built on a language model two generations behind current capability. The clear call was to kill it.

The problem was that the Head of Customer Experience had pitched Meri to the board two years earlier. His name was on the original business case.

The conversation that worked was not "your project failed." It was: "Meri has consumed $340,000 and has never had a live user interaction. The technology it runs on is now two generations behind. If we want a customer chatbot, we should build it fresh on a modern architecture in Q3. That is not leaving the investment on the table. That is admitting it did not work and starting over smarter."

He was given a seat on the Q3 rebuild team. The pilot was killed. The 12% do not avoid hard conversations. They prepare for them.

The most dangerous project in your portfolio

It is not the project burning the most budget. It is the one nobody is watching.

In the insurance audit, the riskiest item was an HR AI screening tool purchased nine months earlier that had never been deployed. It was sitting under contract at $84,000 per year, under the EU AI Act's employment category, which means it requires a Fundamental Rights Impact Assessment before deployment.

Legal had never reviewed it. HR had never deployed it. Nobody was watching it.

The outcome: the tool was killed, and a new procurement rule was established requiring Legal and Security review before any AI tool contract is signed. That single process change prevented more future damage than any individual kill decision.

Before your next portfolio audit, look for the projects with no active owner, no deployment date, and no legal review. Those are the ones with liability accumulating silently.

The portfolio health formula

Track this quarterly, not annually:

Portfolio Health Score =
  (% of pilots with verified North Star metrics)
  + (% of pilots in production or active shadow mode)
  - (% of pilots with no measurable progress in 90 days)

Target: above 70%
Below 50%: run the audit immediately

If you answer no to three or more of these five questions, you have a pilot graveyard:

Can you name every active AI pilot in your organization right now?
For each pilot, does a business owner (not engineering) own a metric with a baseline?
Has any pilot been officially killed in the last 12 months based on performance?
Is your actual AI spend known to your CFO at loaded cost level?
Do you have a single named owner for each pilot who can be held accountable?

What the 12% do differently

The companies that make it to production share one trait that has nothing to do with technology: they define what winning looks like before they write the first line of code.

Not "we want to improve efficiency." Not "we want to be AI-first." Specific. Measurable. Owned by someone whose compensation is connected to the outcome.

The insurance company I worked with went from 24 pilots with no production AI systems to three production systems moving real business metrics in six months. Claims cycle time dropped from 11.2 days to 7.4 days. Fraud detection rate on the motor line improved from 61% to 74%. Underwriting turnaround time fell from 3.2 days to 0.9 days.

Not because the technology changed. Because the decision-making process changed.

Kill the pilots that should die. Concentrate resources on the three things that can actually win. Measure in business language, not model accuracy.

That is the 12%.

Running your own pilot audit?

In The Elite AI Leadership Accelerator, we work through the full portfolio audit framework with Heads of AI, Directors, and VPs who are done with pilot graveyards and ready to ship. The first session covers the Value-Complexity Matrix and the North Star test with your actual portfolio.

What I build and how I can help

MasterDexter live cohorts
- AI Engineer HQ (8 weeks, 4 production systems)
- AI Leadership Accelerator (8 weeks)
MasterDexter Teams - private cohorts to train your AI team on production systems
AITalentStudio - vetted, production-ready AI talent for your company
Dextar - AI engineering development and consulting for enterprises and startups
Buildership - ideas to ship real AI

Why 88% of AI Pilots Never Reach Production (And How to Be in the 12%)

The hidden cost most leaders never calculate

The Value-Complexity Matrix

The North Star test

The political problem nobody talks about

The most dangerous project in your portfolio

The portfolio health formula

What the 12% do differently

What I build and how I can help

Stop reading about AI systems. Start shipping them.

LangGraph vs LangChain: Which One Do You Actually Need in Production?

We Automated 50 Daily Emails at Hector Beverages. Here Is Exactly What We Built.

How to Eliminate Hallucinations in Production AI (Without Fine-Tuning)

Your AI Team Is Measuring the Wrong Things (Here Is the One Number That Actually Matters)