Two bugs prevented subscription users from routing through Claude Code CLI:
1. Retry handler regex only matched "third-party" errors but actual error is
"You're out of extra usage" — fallback never triggered
2. auto-model-selection actively rerouted bare model IDs back to anthropic
even after startup migration set claude-code as the session provider
Prevent _tryClaudeCodeFallback from firing for non-Anthropic providers
that may produce similar error text, avoiding unintended provider drift.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Anthropic now blocks third-party apps from using Pro/Max subscription
quotas via direct API calls. This change makes the claude-code provider
(which delegates to the local claude CLI binary) the default path for
Anthropic subscription users — TOS-compliant because requests flow
through Anthropic's own infrastructure.
Changes:
- Enhanced readiness check to verify CLI auth status (not just binary)
- Startup migration: auto-switch anthropic → claude-code when CLI ready
- Error recovery: auto-switch on third-party 400 block error
- Onboarding: removed Anthropic from OAuth, added Claude CLI option
- Added claude-code to flat-rate providers (no dynamic routing benefit)
Closes#3772
Explicit model changes aborted the active backoff sleep but left already-queued retry callbacks alive. That let stale provider retries continue after a user or runtime model switch, which could keep surfacing the old provider's backoff errors in the new session state.
Cancel the full retry generation on explicit model switches by clearing queued continue callbacks in RetryHandler, then cover the behavior with regression tests for both the session switch ordering and the queued-retry cancellation path.
The "Extra usage is required for long context requests" error from
Anthropic is a billing gate, not a transient rate limit. Classify it as
quota_exhausted so the handler enters the fallback path instead of an
infinite backoff loop. When no cross-provider fallback exists, attempt a
[1m] to base model downgrade before stopping cleanly.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Server errors (500/502/503/504) are server-side failures — rotating
credentials doesn't help. Only rate_limit and quota_exhausted are
meaningfully credential-scoped. This prevents the cascading backoff
where a single 500 backs off the sole API key for 20s, causing all
subsequent retries to fail with "All credentials temporarily backed off".
Closes#2588
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract two self-contained subsystems from agent-session.ts (3,367 -> 2,737 lines):
- RetryHandler: auto-retry with exponential backoff, credential rotation,
and cross-provider fallback logic
- CompactionOrchestrator: manual/auto compaction, overflow recovery, and
extension integration for custom compaction providers
Also add shared getErrorMessage() utility to replace repeated
`err instanceof Error ? err.message : String(err)` patterns.
The extracted modules receive AgentSession state via dependency injection
interfaces, avoiding state duplication. AgentSession remains the coordinator
that delegates to these modules.