-
Grok — v4.1Better multi-step coding reliability, fewer logic bugs and missing edge cases in longer code outputs. Stronger repo-level coherence, better at keeping imports, file boundaries, and refactor intent consistent across multiple files. More stable instruction-following, less format drift and fewer ignored constraints in long prompts. Improved tool and agent follow-through, fewer broken tool sequences and more complete multi-step task execution. Higher output stability, fewer abrupt truncations and fewer random mid-answer shifts in tone or direction. -
Álvaro Sánchez Román🙏 109 karmaOct 18, 2024@NotebookLMAccuracy nice. Free -
Claude — v4.6Claude Opus 4.6 Better financial reasoning (+5.47% on Finance Agent, SOTA on TaxEval) Improved information extraction (BrowseComp, DeepSearchQA) Gains on SWE-bench and agentic benchmarks 128K max output tokens (was 64K) Adaptive thinking replaces budget_tokens (deprecated) New "max" effort level; effort parameter now GA Server-side context compaction (infinite conversations) Data residency controls (inference_geo) Prefilling assistant messages removed Better prompt injection resistance Stronger alignment on sensitive tasks Better first-pass quality for spreadsheets, presentations, financial models Claude in Excel handles longer, more complex tasks
