September 22, 2025

Dueling Models

A fun trick I’ve used lately is to have Claude Code and Gemini critique each other through a Markdown document conversation. It’s surprisingly effective.

In Claude Code, I’ll have it do some work without checking it in just yet. Then I’ll start up gemini in the same working directory and say:

You’re a code reviewer. You will not edit code. Review the current changes. They were intended to [bug/feature description]. Evaluate the solution and write your review to reviews/Review-1.md and I will inform the other coder.

It spins for a while, does its thing, then writes out a big good/bad review. Then I hop over to Claude and say:

Gemini reviewed your solution and left feedback. Evaluate the feedback, determine what to accept and what to push back on. Write your response to reviews/Review-1-Response.md. @reviews/Review-1md

Then off it goes. Hop back to Gemini, point at the response, ask it to write it’s own response to Review-2.md and repeat until Claude says “We’re good!” Now I’ll go read the file that has the latest plan, give it a once-over, make any edits I need to, then tell Claude to read it again and generate a Task list with TodoWrite, keep a development log in DevelopmentLog.md (if you didn’t know about that, try it), and send it off to fix it all.

When it’s done, you guessed it, I ask Gemini to review the changes against the last document and read the development log to understand why certain things happened. More often than not it has a mix of “perfect job” and “you stupid idjit” comments, which I ask it to write to a file…

This is great for big, complicated changes as it keeps it honest. I’m aware there are MCPs that let you do this automatically, but I’ve seen these things go off to La-La Land more than half the time I leave them to themselves, so I prefer a human-in-the-loop approach in general. This gives me both that level of control as well as a solid review of the code.

To be honest, this also plays right into the strengths of each model as they stand today. Gemini is a fantastic code reviewer, but I’m not too impressed with its actual code. Claude is more than willing to forgive itself and skip out on half of the work, so having another model hold it accountable is — as it would say — key.

movq

Wherein I Move A Lot of Words Around

Dueling Models