Thanksgiving week is usually quiet for tech. Not this time. After watching Dr. Matthew Jarvis test the latest models, I came away with a clear view: Anthropic’s Claude Opus 4.5 is the best practical choice for active software work, while Google’s Gemini 3 shines for first drafts and design flair. On images, Black Forest Labs’ Flux 2 is a strong step for open tools, but it still stumbles on dense text.
Coding: Where Claude Pulls Ahead
Jarvis did not mince words on Anthropic’s release. He put Claude Opus 4.5 through game builds, bug hunts, and a multi-day app project. His verdict was blunt and consistent: Claude now thinks across systems, not just lines.
“They just released the most impressive coding model I’ve ever personally tested.”
“It feels like the first time an AI model actually understands how all the pieces connect across a bigger system.”
Two features matter for real projects. First, the new “effort” setting lets developers trade speed for deeper reasoning without wasting tokens. Second, Claude Code’s “plan mode” asks clarifying questions, writes a plan file, then executes it. That sequence reduces rework and wandering.
Price cuts also matter. Opus-level performance now costs less per million tokens, and the model is available across apps, extensions, and major clouds. For teams that live in editors and terminals, cost plus reach equals adoption.
Jarvis compared Claude with Gemini 3 using the same stress tests. Gemini often nails a single-shot demo with slick visuals. But when projects get messy, Claude fixes bugs faster and avoids looped retries.
“Gemini 3 is better at one-shotting great apps… Opus is great for picking up where Gemini left off, fixing bugs, refactoring code, adding new features.”
Images: Flux 2 Impresses, With A Catch
On the visual side, Jarvis ran Black Forest Labs’ Flux 2 through product shots, posters, identity carry-over, and infographics. It handled style matching and scene control well. Multi-image reference support is solid. The open-weight track is a big win for local runs.
But the weak spot remains clear: long, precise text inside images still breaks down. Labels and step-by-step instructions drift into gibberish when the copy gets heavy.
“Visually not too bad. Text-wise falls off a cliff pretty quickly.”
He also tried a celebrity group scene with references. The likenesses blurred and mixed, trailing “Nano Banana Pro” on identity consistency. For posters and product scenes, Flux 2 looked good. For dense instructional graphics, not yet.
What This Means For Builders
These tests point to a simple playbook for shipping faster without losing quality. Use the right model at the right stage and keep feedback loops short.
- Start with Gemini for a quick first draft and cleaner UI.
- Switch to Claude Opus 4.5 for debugging, refactors, and edge cases.
- Use Claude Code “plan mode” to cut guesswork before execution.
- Try Flux 2 for product imagery and style transfers, but limit dense text.
- Keep open-weight options handy for local, private workflows.
Jarvis’s own journal app proved the point. He iterated for 72 hours, feature by feature. OCR for scans, audio transcription, mood filters, tag generation, streaks, even AI insights across entries. Claude stayed steady through the grind. That is the job that matters most to working developers.
The Rest Of The Week, In Short
There were other updates worth noting. Notebook LM now produces slide decks and infographics with strong visuals. ChatGPT added a new in-app voice interface and a shopping research mode. Perplexity shipped stronger memory. Microsoft revealed a small local model for computer control. Meta teased text-to-3D world generation. Music models kept striking deals. And LTX added “retake,” letting you tweak parts of a finished video without starting over.
Here’s my take: nice extras, but tool choice for real work still matters most. Claude’s practical strength and lower cost tilt the field for builders right now.
Final Thought
We do not need more demos. We need finished projects. Pair Gemini for scaffolding with Claude for the hard yards. Use Flux 2 when you control the text load. Push vendors for safer, cheaper, clearer tools. Then ship.
If you care about results, pick the stack that gets you to “done” this week, not someday.
Frequently Asked Questions
Q: Why pick Claude Opus 4.5 over Gemini for coding?
Gemini is great for a first pass and design polish. Claude is stronger at multi-step fixes, refactors, and avoiding repeat failures when projects get complex.
Q: Where does Flux 2 work best right now?
It shines on photoreal shots, posters, and style-consistent edits. For long or precise text in images, expect errors and plan to add real text later.
Q: Is the “effort” setting in Claude worth using?
Yes. Lower effort saves tokens for quick tasks. Higher effort improves reasoning on tricky bugs or cross-system logic without ballooning output size.
Q: How should teams split work between models?
Draft the UI with Gemini, then move to Claude for implementation, debugging, and feature growth. Keep Flux 2 for imagery when you can avoid dense copy.
Q: What about safety and alignment claims?
Jarvis found Claude harder to trick and more stable under attack prompts. Treat any claim with caution and continue red-teaming for your own use cases.





















