Now that GPT-5.4 has been out for a couple weeks, I’m curious how people are actually using the reasoning.effort parameter in practice.
For context, it controls how much internal compute the model spends on chain-of-thought before responding. You can set it to low, medium, or high. Higher effort means better accuracy on hard problems but slower responses and higher cost.
I’ve been experimenting with it for a data extraction pipeline and my initial findings are kind of interesting:
- For straightforward structured extraction (pulling names, dates, amounts from invoices),
loweffort works just as well ashighand runs about 3x faster - For ambiguous classification tasks where the categories overlap,
higheffort noticeably improves accuracy, maybe 8-12% on my eval set mediumfeels like a weird middle ground that I haven’t found a great use case for yet
What I’m still trying to figure out:
- Is anyone dynamically switching effort levels based on input complexity? Like, run a quick classifier first and only escalate to
highfor tricky inputs? - How does reasoning.effort interact with the 1M token context window? I’m worried that
higheffort on a massive context could blow up latency and cost. - For agentic workflows with tool use, does effort level affect how well the model plans multi-step tool calls?
Would love to hear what patterns people are settling on, especially if you’ve done A/B testing in production.
Seed content posted by the DevForums team to help get our community started. Have a better answer? Jump in!