Productized audit · 48-hour turnaround · 3 slots per week

AI Agent Audit — 48-Hour Diagnostic

We run five Bell Tuning sensors against your AI's search and agent pipeline, then send you an 8-12 page PDF: bell curves showing what's happening underneath, the silent bugs we found, and a ranked fix list. Plus a walkthrough call and 7 days of Q&A. Fixed price. No scope negotiation.

$2,500 USD · flat · one-time
Book an audit What's included
Two slots open this week

What you get

Deliverables, not deliberations. Everything listed below ships within 48 hours of data handoff.

You receive

  • All 5 Bell Tuning sensors run against your data: context-inspector, retrieval-auditor, tool-call-grader, predictor-corrector, audit-report-generator
  • 8-12 page PDF report with bell curves showing the shape of your AI's data flow
  • Silent bugs found, sorted by type, with severity scores and real examples from your data
  • Ranked fix list (3-5 items), ordered by effort and impact
  • 30-minute Zoom walkthrough of findings
  • 7 days of Slack / email Q&A after delivery

You provide

  • Your search/retrieval setup, or read-only access to your AI pipeline
  • 10-20 sample queries that look like what real users ask
  • A small sample of your knowledge base (1,000-10,000 chunks)
  • 30-minute kickoff call to confirm scope
  • Any existing eval test data you already have (optional, useful)

Silent bugs the audit flags

Standard accuracy metrics (precision@K) miss all of these. They show up in the shape of your data before users start complaining.

Score miscalibration
The top results are still in the right order, but the actual score numbers have drifted as your knowledge base grew. Accuracy metrics can't see this.
Rank inversion
All the relevant documents are in the top results, but in reverse order. The AI reads them top-down and makes stuff up based on the wrong one first.
Redundancy attacks
Near-duplicate documents push the truly distinct ones out of the top results. The answer comes back confidently incomplete.
Two-cluster search results
The search returns two groups of high-scoring documents. The wrong group wins. The AI confidently gives the wrong answer.
Contamination drift
Off-topic stuff slowly leaks into the top search results over time. The average relevance score drops, but nobody notices until complaints arrive.
Tool fixation / agent loops
The AI gets stuck calling the same tool over and over. Detectable several turns before production breaks.

48-hour timeline

Clock starts at data handoff, not at purchase. Kickoff call same day.

1
Kickoff call
Same day (30 min)
2
Data handoff
Hours 0-4
3
Sensor analysis
Hours 4-36
4
Report delivered
Hour 48
5
Walkthrough
Day 3 (30 min)

Who this is for

Good fit

  • Your team is already running an AI system in production that pulls info from a knowledge base or coordinates multiple agents
  • Users are reporting "plausible-sounding but wrong" answers
  • Your test suite still passes, but the quality feels off
  • The knowledge base has grown a lot since you launched, or the agent system has gotten more complex
  • You need a defensible, numbers-backed read before making a bigger decision (rewriting the system, switching to a new embedding model, full rebuild)

Not the right fit

  • Still in prototype phase, no production traffic
  • Looking for implementation labor (this is diagnosis, not remediation)
  • Need a custom research project or bespoke framework
  • Under $10k annual AI budget (this is $2,500; the remediation work that follows costs more)

Book an audit

$2,500 flat. 48-hour turnaround. Two slots open this week.

Pay now Discuss fit first

Not sure if your setup fits? Email first; we'll confirm fit before you pay.
Full framework: Bell Tuning manifesto + whitepapers + open-source sensors

How this relates to the free open-source tools

The five Bell Tuning sensors are MIT-licensed and free. Install with npx contrarianai-context-inspector --install-mcp. If you want to run them yourself, you don't need this audit.

The audit exists for teams that want the analysis done for them in 48 hours with a defensible report attached. You're paying for the turnaround, the read of what the sensor output means, and the ranked fix list — not the tools themselves.