
| This text initially appeared on Medium. Tim O’Brien has given us permission to repost right here on Radar. |
Once you’re working with AI instruments like Cursor or GitHub Copilot, the actual energy isn’t simply accessing completely different fashions—it’s realizing when to make use of them. Some jobs are OK with Auto. Others want a stronger mannequin. And generally it is best to bail and change should you proceed spending cash on a fancy downside with a lower-quality mannequin. If you happen to don’t, you’ll waste each money and time.
And that is the lacking dialogue in code era. There are a couple of “camps” right here; nearly all of individuals writing about this seem to view this as a fantastical and enjoyable “vibe coding” expertise, and some individuals on the market are attempting to make use of this expertise to ship actual merchandise. In case you are in that final class, you’ve most likely began to understand that you would be able to spend a implausible amount of cash should you don’t have a method for mannequin choice.
Let’s make it very particular—should you join Cursor and drop $20/month on a subscription utilizing Auto and you’re pleased with the output, there’s not a lot to fret about. However if you’re beginning to run brokers in parallel and are paying for token consumption atop a month-to-month subscription, this publish will make sense. In my very own expertise, a single developer working alone can simply spend $200–$300/day (or 4 instances that determine) if they’re making an attempt to sort out a undertaking and have opted for the most costly mannequin.
And—if you’re an organization and also you give your builders limitless entry to those instruments—prepare for some surprises.
My Escalation Ladder for Fashions…
- Begin right here: Auto. Let Cursor path to a robust mannequin with good capability. If output high quality degrades or the loop happens, escalate the problem. (Cursor explicitly says Auto selects amongst premium fashions and can change when output is degraded.)
- Medium-complexity duties: Sonnet 4/GPT‑5/Gemini. Use for targeted duties on a handful of recordsdata: strong unit assessments, focused refactors, API remodels.
- Heavy carry: Sonnet 4 – 1 million. If I must do one thing that requires extra context, however I nonetheless don’t need to pay high greenback, I’ve been beginning to transfer up fashions that don’t rapidly max out on context.
- Ultraheavy carry: Opus 4/4.1. Use this when the duty spans a number of tasks or requires lengthy context and cautious reasoning, then change again as soon as the massive transfer is completed. (Anthropic positions Opus 4 as a deep‑reasoning, lengthy‑horizon mannequin for coding and agent workflows.)
Auto works effective, however there are occasions when you may sense that it’s chosen the fallacious mannequin, and should you use these fashions sufficient, you understand when you’re Gemini Professional output by the verbosity or the ChatGPT fashions by the best way they go about fixing an issue.
I’ll admit that my heavy and ultraheavy decisions listed below are biased in direction of the fashions I’ve had extra expertise with—your individual expertise may differ. Nonetheless, you also needs to have an analogous escalation checklist. Begin with Auto and solely improve if you should; in any other case, you will be taught some classes about how a lot this prices.
Watch Out for “Pondering” Mannequin Prices
Some fashions assist specific “pondering” (longer reasoning). Helpful, however costlier. Cursor’s docs be aware that enabling pondering on particular Sonnet variations can depend as two requests underneath group request accounting, and within the particular person plans, the identical thought interprets to extra tokens burned. Briefly, pondering mode is great—use it while you want it.
And when do you want it? My rule of thumb right here is that once I perceive what must be performed already, once I’m asking for a unit take a look at to be polished or a way to be executed within the sample of one other… I normally don’t want a pondering mannequin. Then again, if I’m asking it to research an issue and suggest varied choices for me to select from, or (one thing I do typically) once I’m asking it to problem my choices and play satan’s advocate, I’ll pay the premium for the perfect mannequin.
Max Mode and When to Use It
If you happen to want large context home windows or prolonged reasoning (e.g., sweeping adjustments throughout 20+ recordsdata), Max Mode can assist—however it can devour extra utilization. Make Max Mode a non permanent software, not your default. If you end up continually requiring Max Mode to be turned on, there’s a very good probability you’re “overapplying” this expertise.
If it must devour one million tokens for hours on finish? That’s normally a touch that you simply want one other programmer. Extra on that later, however what I’ve seen too typically are managers who assume that is just like the “vibe coding” they’re witnessing. Spoiler alert: Vibe coding is that factor that individuals do in shows as a result of it takes 5 minutes to make a foolish online game. It’s 100% not programming, and to make use of codegen, right here’s the key: It’s a must to perceive the best way to program.
Max Mode and pondering fashions should not a shortcut, and neither are they a alternative for good programmers. If you happen to assume they’re, you will be paying high greenback for code that can someday need to be rewritten by a very good programmer utilizing these similar instruments.
Most Vital Tip: Watch Your Invoice as It Occurs
Crucial tip is to often monitor your utilization and utilization charges in Cursor, since they seem inside a minute or two of operating one thing. You possibly can see utilization by the minute, the variety of tokens consumed, and in some circumstances, how a lot you’re being charged past your subscription. Make a behavior of checking a few instances a day, particularly throughout heavy classes, and ideally each half hour. This helps you catch runaway prices—like spending $100 an hour—earlier than they get out of hand, which is totally doable should you’re operating many parallel brokers or doing resource-intensive work. Paying consideration ensures you keep in command of each your utilization and your invoice.
Preserve Observe and Keep away from Loops
The opposite factor you should do is hold observe of what works and what doesn’t. Over time, you’ll discover it’s very straightforward to make errors, and the fashions themselves can generally fall into loops. You may give an instruction, and as a substitute of resolving it, the system retains operating the identical course of time and again. If you happen to’re not paying consideration, you may burn by way of lots of tokens—and some huge cash—with out really getting sound output. That’s why it’s important to look at your classes intently and be able to interrupt if one thing appears prefer it’s caught.
One other pitfall is pushing the fashions past their limits. There are duties they’ll’t deal with properly, and when that occurs, it’s tempting to maintain rephrasing the request and asking once more, hoping for a greater end result. In observe, that usually results in the identical cycle of failure, besides you’re footing the invoice for each try. Figuring out the place the boundaries are and when to cease is vital.
A sensible method to keep on high of that is to take care of a operating diary of what labored and what didn’t. Report prompts, outcomes, and notes about effectivity so you may be taught from expertise as a substitute of repeating costly errors. Mixed with maintaining a tally of your reside utilization metrics, this behavior will aid you refine your strategy and keep away from losing each money and time.
