Episode Description
I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack.
—
What you’ll learn:
- The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks
- How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models
- Why Codex excels at code review but struggles with creative, greenfield work
- The surprising way Opus and Codex complement each other in a real-world engineering workflow
- How to use Git concepts like work trees to maximize productivity with AI coding assistants
- Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget)
—
Brought to you by:
WorkOS—Make your app enterprise-ready today
—
Detailed workflow walkthroughs from this episode:
• How I AI: GPT-5.3 Codex vs. Claude Opus 4.6—Shipping 44 PRs in 5 Days: https://www.chatprd.ai/how-i-ai/gpt-5-3-codex-vs-claude-opus-4-6
• How to Combine Claude Opus and GPT-5.3 Codex for High-Velocity Code Refactoring: https://www.chatprd.ai/how-i-ai/workflows/how-to-combine-claude-opus-and-gpt-5-3-codex-for-high-velocity-code-refactoring
• How to Redesign a Marketing Website Using Claude Opus 4.6 for Creative Development: https://www.chatprd.ai/how-i-ai/workflows/how-to-redesign-a-marketing-website-using-claude-opus-4-6-for-creative-development
—
In this episode, we cover:
(00:00) Introduction to new AI coding models
(02:13) My test methodology for comparing models
(03:30) Codex’s unique features: Git primitives, skills, and automations
(09:05) Testing GPT-5.2 Codex on a website redesign task
(10:40) Challenges with Codex’s literal interpretation of prompts
(15:00) Comparing the before and after with Codex
(16:23) Testing Opus 4.6 on the same website redesign task
(20:56) Comparing the visual results of both models
(21:30) Real-world engineering impact: 44 PRs in five days
(23:03) Refactoring components with Opus 4.6
(24:30) Using Codex for code review and architectural analysis
(26:55) Cost considerations for Opus 4.6 Fast
(28:52) Conclusion
—
Tools referenced:
• OpenAI’s GPT-5.3 Codex: https://openai.com/index/introducing-gpt-5-3-codex/
• Anthropic’s Claude Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
• Cursor: https://cursor.sh/
• GitHub: https://github.com/
—
Other references:
• Tailwind CSS: https://tailwindcss.com/
• Git: https://git-scm.com/
• Bugbot: https://cursor.com/bugbot
—
Where to find Claire Vo:
ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
—
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.