Protocol Engineering: Turning AI Prompts Into Reliable Systems

This article was originally published on LinkedIn.

Over the past few months, one request has come up more than any other from my NetSuite AI clients: they want a single, trustworthy report that summarizes their company's overall financial health. And they want it to be professional, and consistent. Something that looks and feels like something straight from a CFO's office.

So I set out to build one.

I thought this was going to involve designing another sophisticated prompt - a single, big, complex prompt. But that assumption didn't last long.

The First Attempt

The first version of my prompt seemed promising, especially when I ran it against a simple NetSuite test instance. But as soon as I tried to scale it across multiple clients, and run it against more complex instances, things started breaking down.

Outputs became inconsistent. Context drifted.

And then I started seeing Claude's dreaded "maximum length" message:

"Claude hit the maximum length for this conversation. Please start a new conversation to continue."

Even when the prompt appeared to work, it couldn't handle complex NetSuite instances. The bigger the dataset, the more chaotic the results.

The underlying issue was that, while AI models are capable of remarkable reasoning, they're not very good at managing complexity. And I was feeding it a very complex prompt.

When doing financial analysis, every step in the process requires a different kind of thinking: data validation, ratio building, variance analysis, narrative synthesis. Trying to make a single prompt do all of this is like asking one person to serve as a company's accountant, strategist, and copywriter - and to perform all of those roles at the same time.

The Breakthrough

So instead of one massive prompt, I began designing a system of smaller, specialized prompts, each with a clear purpose, validation step, and format. The prompts would be run sequentially, forming a prompt chain.

The original 30,000-character prompt evolved into a five-stage pipeline, where every stage produced a structured deliverable.

Here's an overview of the workflow, listing each of the 5 stages, and their output:

Data Discovery → JSON
Metrics Calculation → Excel
Strategic Analysis → Markdown
Board Report Generation → HTML
QA + Action Plan → Markdown

Each stage runs independently. This resets context, freeing up token budget, and preventing "drift." Every prompt's output becomes the input for the next step in the process, and the outputs are serialized and type-checked. The goal here is to prevent "garbage in / garbage out" between the steps.

If something breaks, you can fix that one link in the chain, without losing the rest of the chain. In other words, this is prompt chaining, but engineered as if it were software.

Each link behaves like a stateless function, with well-defined inputs, outputs, and predictable behavior.

Protocol Engineering: A New Kind of AI Design

Prompt engineering gets you answers. But what I ended up doing was essentially protocol engineering. And the end result was a solution that was both reliable and repeatable.

In a typical prompt chain, outputs just flow from one step to another. But with protocol engineering, each step is tested, typed, and validated - very much like a professionally developed software module.

Each stage in this protocol followed three simple rules:

One purpose per stage
One format per output
Validation before execution

By the time the process completes, it produces a fully auditable intelligence package consisting of seven artifacts in total (including data, metrics, analysis, a board-ready HTML report, a Q&A briefing, and a 90-day action plan).

Seeing It Come to Life

When I ran the process on a test instance, the results were remarkable.

It detected a 71% increase in the cash conversion cycle, traced it to rising inventory, and calculated that reducing DIO by 60 days could free up nearly $1 million in cash. It flagged margin compression, forecasted liquidity impacts, and generated board-ready talking points. And it did all of this in under 90 minutes.

The final HTML report is one of the most impressive reports that I've ever generated via AI. It has a polished layout, clear commentary, color-coded signals, and more.

Here's an example of full Financial Health Assessment Report: View Example Report

What I Learned

This project helped to confirm something I'd been suspecting for a while now: That professional prompt engineering is "real" engineering. The best results don't come from a cleverly worded prompt. They come from carefully designed systems of prompts.

Here are a few takeaways from this project:

Structure beats scale. Bigger context windows don't fix messy logic. Structure does.
Serialization builds trust. When every output is a file, not free text, you can audit it.
AI very often needs orchestration. Context resets and data checkpoints turn chaos into consistency.
Design for cognition. Each prompt should think like a role: analyst, strategist, communicator.

Wrapping Up

I started this project trying to create a better prompt. What I built was something more like a system - a blend of workflow, architecture, and philosophy.

It's convinced me that the future of AI won't just be about writing better prompts. It'll be about building better systems of prompts. Systems that can reason, explain, and deliver work that executives can actually use and trust.

The Financial Health Assessment Protocol was built for finance, but I think the pattern applies anywhere structured data meets expert reasoning. For example, supply chain, HR, operations, even audit and compliance.

That's what excites me most. This isn't just a one-off workflow. It's a strategy for using AI for serious work.

I'm considering making the Financial Health Assessment Protocol available for purchase. If you're interested, drop a comment or DM me.