NEW METHODOLOGY · BELL TUNING™

Find the Hidden Bugs in Your AI App

Most AI apps fail quietly. The answer looks correct. The user trusts it. Three weeks later something breaks and nobody can figure out why. We watch what your AI does behind the scenes — the search results it pulls, the tools it calls, the memory it keeps — and spot problems hours before the wrong answer shows up. Free open-source tool + $2,500 audits.

Buy Rapid Audit — $2,500 Talk to Kevin first

48-hour turnaround · 3+ production issues found or you don't pay

Or try the free instrument → npx contrarianai-context-inspector --install-mcp

What is Bell Tuning?

Think about debugging code. The easy way: run it and see if it crashes. The smarter way: print out variables while the code runs and watch them. You spot the bug before the crash.

Bell Tuning is the same idea for AI apps. Instead of waiting for a wrong answer, we look at the data flowing through the AI — the search results it pulls, the text chunks it reads, the tools it calls — and plot it on a graph.

The graph is shaped like a bell curve. Tall in the middle, short on the sides. When the AI is working well, the bell looks one way. When it's about to go wrong, the bell flattens or shifts to the side. The shape changes before the AI gives a bad answer — sometimes hours or days before.

The bell shape is your early-warning system. You don't have to guess if your AI is healthy. You can see it.

We built a free tool that draws this graph in real time. It plugs into popular AI coding tools (Claude Desktop, Cursor, Windsurf, Cline, Claude Code). One command to install:

npx contrarianai-context-inspector --install-mcp

The tool is open source and MIT-licensed. The math + research write-up are in the GitHub repo if you want to go deeper.

If you're building anything with AI — a chatbot that looks things up in a knowledge base, a coding assistant, a system that uses tools, anything that holds a long conversation — you should be Bell Tuning. The free tool tells you when the bell is wrong. The paid audits below tell you why, and fix it.

Watch the shape:

Bell shape	What it means	Action
Tight, right-shifted	Context is on-domain. Healthy.	Keep going.
Wider, drifting left	Contamination, summary loss, or topic drift entering.	Tune now — refresh, evict, re-ground.
Flat near zero	Original content is gone. System is still answering — on noise.	Reset. Output cannot be trusted.

The proof: 40-step contamination experiment (white paper in repo).

→ Step 11: bell σ jumps 56%. Output still scores 0.85. The graph saw it. Output didn't.

→ Steps 12–14: bell flattening. Output still passing. Three steps of warning.

→ Step 15: bell collapses. Output hits 0.00. Never recovers.

Try Bell Tuning yourself — the instrument is free and open source.

Read the manifesto → View on GitHub White Paper

If you've lost money on AI, it wasn't the model's fault.

90%

of so-called "AI agents" are really just step-by-step automations dressed up in AI clothes — burning expensive AI time on decisions that should be a simple if/else in code.

147K

That's where the AI actually starts forgetting things — not at the 200K limit the box says it can handle. Your AI loses important details long before it hits the official cap.

40x

cost gap between running every AI helper on the most expensive model vs. picking the right-sized cheaper model for simple sub-tasks. Most teams pay for the top tier on everything.

errors logged when an AI calls the wrong tool because the tool's description was confusing. No alerts. No logs. Just wrong answers, delivered with total confidence.

Before & after Bell Tuning

Before (judging by the answer only)

Only noticing the bug after the answer is already wrong
Wrong answers you can't trace back to a cause
$47K/month in AI bills, unclear what you're getting
4 "AI agents" that could just be regular code
AI grading its own homework
Users quietly losing trust in the app

After (Bell Tuning)

Watching the bell width change — spot the bug before the bad answer
Ranked fix list, ordered by which one hurts most
60% cost cut by using cheaper models where they're enough
Clear map of which tasks really need AI vs. plain code
A separate checker grades the AI's work
12-point checklist that says "ready for production"

$499 Snapshot — pick your regulated vertical

Independent-verifier AI-decision reproducibility audit. Three business days. Below procurement threshold. Direct-buy Stripe. Full sample of each vertical available before you spend a dollar.

Legal

Plaintiff PI / defense / GC

FRCP 37(e) + bar-ethics readiness
Case-intake / doc-review / demand-letter AI

Learn more See sample

Healthcare

Hospital / health-system

HHS Section 1557 + CMS + FDA readiness
ED triage / clinical-decision-support AI

Learn more See sample

RIA / Asset Mgmt

Advisers / robo / hybrid

SEC Reg BI + FINRA + Marketing Rule readiness
Portfolio recommendation / KYC AI

Learn more See sample

Insurance

P&C / life / health carriers

NAIC AI Model Bulletin + FCRA readiness
Underwriting / claims / rating AI

Learn more See sample

Banking

Bank / credit union / non-bank

SR 11-7 + ECOA + CFPB readiness
Credit underwriting / fraud AI

Learn more See sample

Pharma / Life Sci

Sponsor / biotech / CRO

FDA 21 CFR Part 11 + ICH GCP readiness
Trial-eligibility / adverse-event AI

Learn more See sample

Federal / Defense

Contractor / agency

NIST RMF + CMMC + DoD Ethics readiness
Predictive-maintenance / decision-support AI

Learn more See sample

Energy / Utility

IOU / ISO / RTO / IPP

NERC + FERC + market-monitor readiness
Load-forecast / dispatch / bidding AI

Learn more See sample

Housing / Mortgage

Lender / AVM / PropTech

Fair Housing + CFPB + HMDA readiness
Underwriting / AVM / tenant-screening AI

Learn more See sample

Education

Higher-ed / K-12 / ed-tech

Title VI + SFFA + FERPA readiness
Admissions / enrollment / student-support AI

Learn more See sample

Vertical not listed? Book 30-min scoping call — Snapshot is domain-agnostic; landing pages will follow demand.

Bell Tuning, done for you.

The free tool tells you the bell is wrong. These engagements tell you why — and fix it. Fixed scope, fixed price, personal guarantee.

Rapid Audit

$2,500

48-hour turnaround · 5 sensors run on your system

All 5 Bell Tuning sensors run against your AI's search + agent pipeline
8-12 page PDF report: bell curves, flagged silent bugs, ranked fix list
30-min walkthrough call + 7 days of Q&A
Pay direct via Stripe; we kick off within hours
3+ issues found or you don't pay

Buy now →

Full Diagnostic

$15,000

2 weeks · 3 days onsite + travel · All 6 failure types

3 days onsite with your team + travel (whiteboarding, pairing, walkthroughs)
Review of how your AI agents are wired together
Check for context rot (AI's memory getting cluttered over a long session)
Review of every tool the AI can call and how it's described
Check that the AI is reading from the right data tables
Cost analysis on AI usage (where you're overpaying)
12-point checklist that says "ready for production"
Ranked fix list with cost estimates for each fix
30 days of follow-up advisory via Slack
3+ issues found or you don't pay

Talk to Kevin

Ongoing Advisory

$7,500/mo

Part-time AI advisor · Month-to-month

Everything in Full Diagnostic
Weekly strategy call
Slack access for async questions
Architecture review when you add new features
Team training + code review
Quarterly re-score on the production readiness checklist

Get started

What the diagnostic looks like in practice

Case Study — Series B AI Platform

4 engineers. 8 months of work. A system using multiple AI agents that "worked great in testing." In production it was: losing order numbers when the AI summarized long sessions, getting stuck in loops when instructions conflicted, approving its own broken answers, and burning $47K/month in AI bills because every helper-agent ran on the most expensive model.

The fix wasn't a rewrite. It was structural: we replaced 3 of the AI agents with plain step-by-step code (they never needed AI in the first place), added a system to keep sessions from getting cluttered, split the "builder" AI from a separate "checker" AI so the builder couldn't grade its own work, and picked the right-sized model for each task.

60%

Cost reduction

3 agents

Replaced with workflows

2 weeks

Time to fix list

The 6 failure types we diagnose

How the Agents Are Wired Together

An AI agent decides it's "done" by reading its own text output instead of a proper end-signal. Helper agents assume they share memory with the main agent — but they don't. The builder AI grades its own homework and obviously gives itself an A.

Context Rot (AI Memory Going Stale)

The AI summarizes the conversation as it goes and loses dollar amounts and order numbers in the summary. Important instructions get buried in the middle of a long context and ignored. Two parts of the AI's memory contradict each other.

Tools Wired Up Wrong

Tool descriptions are ambiguous, so the AI quietly calls the wrong one. Give an AI more than 4-5 tools and it starts picking the wrong one. The code doesn't tell the difference between "the database returned nothing" and "the database is broken."

AI Reading from the Wrong Data Tables

The word "Revenue" means 3 different things across 3 teams. The AI picks one table, gives a confident-sounding answer based on the wrong definition, and ships it. Plausible, polished, wrong.

Wasted AI Spending

The AI reads a whole file (3,000 units of usage) when a simple search would cost 200. Tools eat 2,000-8,000 units of context before any actual work starts. Bulk operations done one-at-a-time when the API supports doing them in one batch.

Not Ready for Production

No real logs. Same config for development and live. Errors silently swallowed. The system decides what's "important" based on the user's tone. No way to hand off a session if one shift ends and another picks up.

Questions I get asked

What's Bell Tuning, exactly?

Bell Tuning treats the AI's working memory as something you can measure and graph. You score each piece of data the AI is using for how well it fits the task, then plot the scores. The graph is shaped like a bell. Healthy AI = tight bell shifted to the right. Drifting AI = bell starts spreading wide and drifting left. Broken AI = flat bell near zero. The free tool (Context Inspector) draws the graph live. This service reads the graph and fixes the cause behind a bad shape.

Do I need to install the free tool first?

No, but it helps. If you've already run npx contrarianai-context-inspector --install-mcp and watched your bell flatten, you already know what's wrong — you just need help fixing it. If you haven't, we'll install it together on day 1.

How is this different from hiring a consultant?

Consultants bill hours. This is fixed-scope: defined deliverable, defined price, defined timeline. You get a written report with a ranked fix list — not a slide deck, not a never-ending retainer. If we don't find 3+ issues, you pay nothing.

What access do you need?

Read access to your AI code repositories, any architecture docs, and your logging dashboards. We don't need production passwords or customer data. Most teams give a short-lived GitHub viewer invite and a read-only Datadog or Grafana login.

What if you don't find anything?

Then you don't pay. That's the guarantee. We've never had to honor it — every engagement so far has found more than 3 issues hitting production. These structural failures are that common.

We already have an internal AI team. Why do we need this?

Your team built the system. That's exactly why they can't objectively diagnose it. The same brain that wrote the code can't see the code's blind spots — that's literally one of the 6 failure types. An outside pair of eyes gets you the ranked fix list without the "we already worked hard on this" bias.

What's the ROI?

The average full diagnostic finds 60% waste in AI spending and 3+ silent bugs. A $15K diagnostic on a $47K/month AI bill typically pays for itself in the first month. The $2,500 Rapid Audit (48-hour turnaround, pay via Stripe) is designed to be a no-brainer for any team spending $5K+/month on AI.

Who actually does the work?

I do. Kevin Luddy. This isn't a firm that sells then delegates. One person runs the diagnostic, writes the report, and walks you through the findings. That's why I limit slots.

What we believe (that most vendors won't say)

"You don't tune an AI by listening to its output. You tune it by watching the bell."

Reading the final answer only tells you something is wrong after it already is. Bell Tuning warns you before the wrong answer ships.

"A smarter model doesn't fix agent failures. A smarter environment does."

Buying a more powerful AI model is the most expensive way to avoid fixing the system around it.

"90% of automation needs are workflows, not agents."

If you can draw the decision tree, you don't need AI. You need a simple if/else in code that costs nothing to run.

"Agents cannot reliably judge their own output."

An AI grading its own work isn't grading — it's confirming what it already thinks. You need a separate checker.

"More data degrades AI performance when context isn't managed."

Past a certain point, stuffing more information into the AI's view makes it worse at the task, not better. There's a sweet spot, and most teams blow past it.