This article was originally published on LinkedIn.
A few weeks ago, a new NetSuite client reached out to me with this request: "Can you build a prompt to help us with year-end close? Our controller is buried."
Their situation was typical for growing companies. They run a multi-subsidiary NetSuite environment. They do around $10M in revenue. It's a strong business, but the controller was drowning during close period.
The owner wanted confidence that their books were going to be clean before signing off at year end. They needed what auditors call a "pre-audit review" - a second set of eyes that catches issues before they become problems.
The owner also wanted an AI prompt that can be run quarterly, not just at year-end.
This article discusses the prompt that I created, what makes it work, and more.
Also, I'm sharing the prompt down below.
What Makes This Type of Prompt Challenging
With these types of AI prompt projects, the challenge isn't the analysis itself. It's making AI do it reliably and repeatably.
Challenge #1: Finance requires precision.
You can't have AI making up account balances or "estimating" depreciation schedules.
One hallucinated number undermines the entire report. The controller loses trust. The board loses confidence. The whole exercise becomes worthless.
Traditional AI tools are trained to be helpful, to fill in gaps, to keep the conversation going. That's death for financial analysis.
Challenge #2: Professional judgment is hard to encode.
When is a variance material? It depends on context. 5% of total assets? 10% of equity? And it depends on the account.
What requires an adjusting entry versus just disclosure? That depends on GAAP standards and the specific situation.
How confident should we be in the conclusion? It depends on data quality, completeness of documentation, whether we had to estimate anything.
This isn't the type of thing you can Google. It comes from years of experience.
Challenge #3: Output has to be board-ready.
You can't just hand over a big blob of text. And it can't be bullet points in a chat window.
The output needs to look like something that a CFO would present to the board. Professional formatting. Clear findings. Specific recommendations. Audit trail documentation.
Most AI outputs look like... well, AI outputs.
Here were my goals with creating the prompt:
Instead of just giving the AI a task, I wanted to make sure that the prompt provided the model with:
- A complete role definition with specific expertise
- Analytical frameworks with step-by-step methodologies
- Quality standards and anti-hallucination rules
- Output templates with exact specifications
- Examples of good judgment in ambiguous situations
So basically, my goal was to create a prompt that encodes the entire Year-End Close Review standard operating procedure.
Here's What I Built
The final prompt is 3,200 words long. It's not a prompt that just asks a question or started a conversation. It's a complete analytical system.
It starts with role definition:
"You are a Senior FP&A / Finance Strategy Analyst with 15+ years of progressive finance experience in NetSuite-based organizations. You have deep specialization in multi-subsidiary, multi-currency environments, expert-level knowledge of GAAP accounting principles, and CFO-grade financial rigor with explicit risk-management and audit-readiness mindset..."
This isn't "flavor text." It sets the lens through which the AI views every piece of data.
Then come 13 modular analytical blocks:
Each block follows the same structure: Purpose → Methodology → Outcome.
- Balance sheet structural integrity (verify accounting equation, identify material accounts)
- Working capital deep dive (cash, AR with DSO calculation, inventory turnover, AP aging)
- Income statement performance analysis (revenue composition, margin analysis, expense ratios)
- Fixed assets & depreciation reconciliation
- Liabilities & obligations completeness (accruals checklist, sales tax verification)
- Equity section verification
- Revenue recognition compliance
- Intercompany reconciliation
- Technical accounting standards review
- And four more...
Each block has explicit instructions. Not "analyze accounts receivable" but something like this:
"Calculate Days Sales Outstanding as (AR / Revenue) × 365 days. Compare to 30-60 day industry benchmark. Calculate required bad debt reserve at 1-3% of total AR based on age and quality. Identify any credit balances potentially misclassified as customer deposits. Test: AR subledger must equal GL control account 1110."
Then comes what I refer to as an "anti-hallucination framework:"
"CRITICAL: Never fabricate balances, dates, periods, or entities. Every data-backed assertion must reference a Data Lineage ID showing source system, query method, and timestamp. When you cannot verify information, flag it as HITM Required (Human-in-the-Middle verification needed). Do not guess. Do not estimate without explicitly stating your assumption and methodology."
The prompt includes multiple quality control layers:
- Data Lineage Table (every number traces to source)
- Assumptions Framework (separate facts from estimates)
- Verification Test Plan (proves the analysis was done correctly)
- Final Execution Checklist (40+ items to verify before output)
And it includes a professional output specification:
Not "make it pretty," but exact specifications:
- HTML format with specific color codes (#007AFF primary blue, #1B2838 deep navy)
- Inter font family, 11pt body text
- Component structures: severity badges (Critical/High/Medium/Low), finding boxes with left border colors, metric cards in grid layout
- Specific sections in specific order
- No placeholder text, no [TBD] markers
How It Works In Practice
Here's the workflow involved in using the prompt:
- Connect to client's NetSuite via MCP (Model Context Protocol - allows AI to query live data)
- AI executes the 13 analytical blocks against real financial data
- Generates complete HTML report in about 5 minutes
- Controller reviews findings and decides on corrective actions
What makes this different from typical AI tools?
- It doesn't guess. It flags what it can't verify.
- It shows its work. Every number traces to source data.
- It calculates confidence. Transparent weighted methodology, not a black box.
- It produces professional output. My client commented that the final report "looks like it came from a Big 4 firm."
The First Real Test
For the initial test, I ran the prompt against my client's FY2024 data:
$10.36M in revenue, multiple subsidiaries, 1,000+ transactions, period ending December 31, 2024. Full balance sheet and income statement analysis. The works.
What it found (the good news):
The company is healthy.
Balance sheet balanced perfectly. $0.00 variance. That's actually a strong signal—it means journal entries were posted correctly all year.
Strong profitability: $871,754 net income on $10.36M revenue = 8.4% net margin. That's solid.
Excellent liquidity: 4.48:1 current ratio. Way above the 2:1 healthy threshold.
Healthy operational metrics: 31.9 day DSO (collections are good), 52 day inventory turns (reasonable for a product business), 36.1% gross margin (healthy).
This is a well-run company. No going concern issues. No red flags about the business model.
What it found (the issues):
But there were problems. Fixable problems, but they needed attention.
2 High-severity findings:
Finding #1: No bad debt reserve established despite $914,166 in accounts receivable.
This is a GAAP violation. You're required to estimate uncollectible receivables. The AI calculated that a 2% reserve would be appropriate based on the 31.9 DSO (middle of the 1-3% range).
Recommended adjusting journal entry: Debit Bad Debt Expense $18,283, Credit Allowance for Doubtful Accounts $18,283.
Impact: Reduces net income by $18,283 but ensures GAAP compliance.
Finding #2: Missing fixed asset schedule despite $9,236 in depreciation expense on the P&L.
The AI found depreciation expense but couldn't find the corresponding fixed asset accounts or accumulated depreciation. This creates a material documentation gap.
Auditors will 100% ask for the complete fixed asset schedule with depreciation calculations by asset class. If it doesn't exist, that's a finding.
3 Medium-severity findings:
Finding #3: Sales tax liability verification needed. Total balance: $73,212 across 6 jurisdictions (California, New York, Massachusetts, Kentucky, Illinois, Indiana).
The AI couldn't determine if these were current amounts awaiting January filing (acceptable) or past-due amounts (problematic). Flagged for HITM verification.
If any are past due, penalties range from 5-25% plus interest. That's potential exposure of $3,660-$18,303.
Finding #4: Accrued liabilities potentially understated by approximately $22,700.
The AI estimated this based on monthly run rates for rent, utilities, and professional fees. But it flagged this as an estimate requiring human validation with actual December invoices.
Finding #5: Inventory valuation methodology undocumented.
Inventory balance: $943,479 (24.7% of total assets). But the AI couldn't find documentation of whether the company uses FIFO, LIFO, or weighted average. It also couldn't confirm evidence of a year-end physical count.
Given that inventory exceeds 15% of assets, auditors will require both valuation method documentation and physical count evidence.
The deliverable:
- 49KB professional HTML report.
- Specific adjusting journal entries with GAAP rationale for each.
- Risk stratification showing which issues matter most and likelihood of audit detection.
- A 30/60/90-day process improvement roadmap.
And here's the key part: A 73% confidence score.
Not 95%. Not "I'm very confident."
Exactly 73%, calculated from weighted scores across categories:
- Balance Sheet Integrity: 100% (perfect)
- GAAP Compliance: 60% (bad debt reserve violation)
- Documentation Completeness: 50% (missing schedules)
- Overall: 73%
That score is essentially indicating that: "The numbers are right, but documentation gaps create moderate audit risk. Proceed with close but fix these issues within 48 hours."
My Client's Reaction
I gave the report from the test run to the company's controller, and her response was: "This is exactly what our external auditors would have found, but we can fix it now instead of during field work."
She appreciated that it flagged uncertainty instead of making things up.
And that 73% confidence score? "That's honest. There ARE things we need to verify manually. If it had said 95%, I don't think I would have trusted it."
Here's the business impact:
- Issues identified before close, not during audit. Check.
- Specific, actionable remediation steps. Check.
- They can now run this quarterly, not just at year-end. So it serves as an "early warning system" for issues before they become problems.
In summary, my client was absolutely thrilled with the results.
What Makes The Prompt Work
Here are the prompt engineering principles that I used that make the prompt so effective.
Principle 1: Create a Persona, Not a Role
Don't say "act like a financial analyst."
Say "You are a Senior FP&A Analyst with 15+ years experience in multi-subsidiary NetSuite environments with expert knowledge of GAAP accounting principles, specializing in technical accounting standards and audit readiness, with CFO-grade financial rigor..."
Why it matters: Generic roles get generic output. Specific expertise gets specific analysis.
The AI needs to know not just WHAT to analyze but THROUGH WHAT LENS to view the data.
Principle 2: Modular Analytical Frameworks
Break complex analysis into blocks with Purpose → Methodology → Outcome structure.
Example from my prompt:
BLOCK 3: WORKING CAPITAL DEEP DIVE
Purpose: Assess liquidity, operational efficiency, and current asset/liability quality
Methodology: A. Cash & Bank Accounts: Enumerate all accounts, identify negative balances, check clearing accounts for aged balances... B. Accounts Receivable: Retrieve AR aging, calculate DSO = (AR / Revenue) × Days, analyze aging distribution, calculate bad debt reserve adequacy at 1-3%...
Outcome: Working capital health summary, list of reconciliation gaps, DSO/DIO metrics, reserve adequacy assessment
Why it matters: Prevents AI from drifting or skipping critical steps. Each block is self-contained with clear deliverables.
Principle 3: Explicit Anti-Hallucination Rules
"CRITICAL: Never fabricate balances, dates, periods, or entities. Every data-backed assertion must reference Data Lineage ID. When you cannot verify information, flag it as HITM Required (Human-in-the-Middle verification needed)."
Create specific flags for uncertainty:
- "Unable to verify - HITM Required"
- "Estimated based on [methodology] - requires validation"
- "Assumes [X] - confirm with Controller"
Why it matters: One hallucinated number destroys the credibility of the entire report. Trust is binary in finance.
Principle 4: Materiality Thresholds
Don't leave "material" undefined.
Provide the accounting standard: "Calculate materiality threshold as MIN($10,000, 5% of total assets, 10% of equity). Flag all accounts exceeding this threshold."
Give specific rules: "If variance > materiality threshold AND variance > 10% of account balance, classify as HIGH severity finding."
Why it matters: Prevents flagging a $47 variance as material or missing a $50,000 issue. AI needs explicit judgment criteria.
Principle 5: Decision Trees & Examples
Don't just describe what to do - show examples of good judgment.
- "If AR balance > $100,000 AND no bad debt reserve exists → HIGH severity finding (GAAP violation)"
- "If depreciation expense exists on P&L AND no fixed asset balances visible → HIGH severity finding (documentation gap)"
- "If sales tax payable > $50,000 AND filing status unknown → MEDIUM severity finding (compliance risk)"
Why it matters: Teaches AI to think like an experienced analyst, not just follow a checklist.
Principle 6: Calculable Confidence Scoring
Not "I'm pretty confident."
Define the exact methodology:
- Balance Sheet Structural Integrity: 20% weight
- GAAP Compliance: 20% weight
- Documentation Completeness: 15% weight
- Working Capital Quality: 15% weight
- Income Statement Reasonableness: 15% weight
- Tax Compliance: 10% weight
- Control Environment: 5% weight
Score each category 0-100. Multiply by weight. Sum to get overall score.
Why it matters: Transparent, defensible, auditable. You can point to exactly why confidence is 73% not 85%.
Principle 7: Professional Output Specification
Not "make it look nice."
Specify exact design elements:
- Primary color: #007AFF (Clarity Blue)
- Typography: Inter font, 11pt body, 28pt headlines
- Component structures: severity badges with specific colors (Critical=#FF3B30, High=#FF9500, Medium=#FFB800, Low=#34C759)
- Layout: finding boxes with 4px left border, metric cards in grid, tables with hover states
Why it matters: Board-ready deliverables require professional design. Can't look like an AI chat transcript.
Principle 8: Data Lineage Requirements
Every number must trace to: source system, query method, timestamp.
Create a Data Lineage Table in the output:
- L001: Account Balances → NetSuite GL → SuiteQL Query → 2024-12-17
- L002: Revenue Detail → NetSuite Transaction Lines → SuiteQL Query → 2024-12-17
Why it matters: Auditors (and controllers) need to verify where numbers came from. Creates accountability trail.
Principle 9: Assumption Logging
Distinguish facts from estimates.
Facts: "AR balance is $914,165.60 per GL query L001"
Estimates: "Bad debt reserve should be 1-3% based on 31.9 DSO; using 2.0% = $18,283 (HITM validation required)"
Flag ALL estimates with: basis of estimate + requirement for human validation.
Why it matters: Separates what AI knows from what it's inferring. Manages risk.
Principle 10: Multi-Layer Quality Control
Build a Final Execution Checklist with 40+ verification items:
Before presenting output, verify:
- Balance sheet balances to $0.00
- All 13 analytical blocks executed
- Data Lineage Table complete
- Assumptions Framework complete
- No placeholder text ([TBD], [XXX])
- Confidence score calculated per methodology
- All sections present in exact order
- HTML renders correctly with specified colors
- ...and more.
Why it matters: AI needs guard rails, and lots of them. This catches errors before they reach the user.
The Bigger Implications
While the prompt that I'm discussing is specifically about year-end close, the technique that I used to create the prompt also applies to:
- Monthly variance analysis (Why did OpEx spike in Q3?)
- Budget vs. actual reviews (Where are we off-plan and why?)
- KPI dashboards (Which metrics matter and what's changing?)
- Board reporting packages (Executive summaries with drill-down details)
- Forecast reviews (Is our revenue forecast still realistic?)
With well-engineered prompts, you can automate a wide range of repetitive, high-skill analysis.
These prompts don't replace the controller. Instead, the prompts free up time so that they can focus on judgment calls, strategic decisions, and relationship management with stakeholders.
The Prompt
The complete 3,200-word Year-End Close Review prompt that I built is available here.
I invite you to download the prompt. Try it. Break it. Improve it.
And if you do give it a try, I'd love to hear your thoughts on it.
One more thing: While the prompt is designed for use with NetSuite, the framework applies to any ERP. You can adapt it to your specific needs.