March 21, 2026 / 4 min read

The cheapest quality check for AI work is another AI

Checking AI output quality gets easy when you point a second AI at the first. A small business method for catching weak arguments before a client or investor does.

Here’s a workflow that’s simple and a little devious, and it turns checking AI output quality into something almost free. Build something with AI, a slide deck, a proposal, a plan, then spin up a second AI whose only job is to tear it apart. Find the weak arguments, the hand-wavy numbers, the slide a sharp investor or executive would stop on and say “wait, explain this.” For a small team with no spare reviewer to lean on, that second pass, the adversarial one, is where most of the value lives.

This is one of the more useful patterns to come out of working with these tools, and it generalizes far past slide decks. The mistake people make is treating AI as a single-shot generator. You ask, it produces, you ship. But generation and critique are different jobs, and an AI optimized to be helpful and produce a finished artifact is exactly the wrong frame of mind for finding flaws. It wants to give you something complete. It doesn’t want to tell you your core argument has a hole in it.

Why one model can’t easily do both

When you ask a model to write a deck and then ask the same conversation to critique it, you get politeness. It’ll suggest tightening a headline or adding a stat. It will rarely say the strategic logic doesn’t hold, because it just spent its effort building that logic and it’s anchored to it. The momentum of having produced something makes honest critique hard, for models the same way it’s hard for people who just finished a draft.

The fix is to break the frame. A fresh instance, given no context except “here is a deck, your job is to attack it as a skeptical CFO,” has nothing invested. It didn’t write the thing. It has no reason to protect it. You’ve converted a helpful assistant into an adversary, and adversaries find problems.

How to actually run the attack

The quality of the critique depends almost entirely on who you tell the second AI to be. “Give me feedback” produces mush. A specific hostile persona produces signal.

Cast a real skeptic. “You are the investor who has seen this pitch fifty times and is looking for the reason to pass.” Or “You are the engineering lead who has to build what this deck promises, and you think the timeline is fantasy.” The more concrete the role, the more concrete the objections. You’re not asking for feedback. You’re simulating the exact person you’re afraid of.

Make it find the weakest slide. Force a ranking. “Which single slide would make this audience lose confidence, and why.” A model asked to find the worst thing will commit to one, and that one is usually the thing you were quietly hoping nobody would notice.

Ask for the question you can’t answer. The most useful output isn’t a critique, it’s a question. “What’s the one question from this audience you have no good answer to?” That tells you exactly where to do more work before you’re in the room and someone asks it for real.

Run it on the strongest version, not the draft. Polish the deck first. You want the red-team AI attacking your best work, because the flaws that survive your own polishing are the ones that’ll actually bite you.

The principle underneath

What’s really going on here is that you’re building an adversarial loop into a process that normally has none. Most work, solo work especially, has no built-in opponent. You write, you convince yourself it’s good, you ship, and reality is the first thing that ever pushes back. That’s an expensive place to discover your argument was weak.

A second AI as a cheap, tireless, slightly mean reviewer moves that moment of pushback earlier, to a point where it costs you nothing to fix. You can run the attack ten times with ten different personas before lunch. No human reviewer has that patience, and no human reviewer is free.

For anyone using these tools in a real business, this is the upgrade that separates getting output from getting good output. The first AI gives you a draft. The second AI, pointed at the first with instructions to be ruthless, gives you the thing you’d otherwise have learned the hard way in the meeting. Generate, then attack, then fix what the attack found. The deck that survives that is the one worth presenting, and the habit transfers to proposals, plans, emails, and pretty much anything you’d hate to be wrong about in public.

The cheapest quality check for AI work is another AI

Why one model can’t easily do both

How to actually run the attack

The principle underneath

Related reading