The Rust API layer rejected thinking-enabled responses because it only recognized text and tool_use content blocks. This commit extends the response and SSE parser types to accept reasoning-style content blocks and deltas, with regression coverage for both non-streaming and streaming responses.
Constraint: Keep parsing compatible with existing text and tool-use message flows
Rejected: Deserialize unknown content blocks into an untyped catch-all | would weaken protocol coverage and test precision
Confidence: high
Scope-risk: narrow
Directive: Keep new protocol variants covered at the API boundary so downstream code can make explicit choices about preservation vs. ignoring
Tested: cargo test -p api thinking -- --nocapture
Not-tested: Live API traffic from a real thinking-enabled model
Extended thinking needed to travel end-to-end through the API,
runtime, and CLI so the client can request a thinking budget,
preserve streamed reasoning blocks, and present them in a
collapsed text-first form. The implementation keeps thinking
strictly opt-in, adds a session-local toggle, and reuses the
existing flag/slash-command/reporting surfaces instead of
introducing a new UI layer.
Constraint: Existing non-thinking text/tool flows had to remain backward compatible by default
Constraint: Terminal UX needed a lightweight collapsed representation rather than an interactive TUI widget
Rejected: Heuristic CLI-only parsing of reasoning text | brittle against structured stream payloads
Rejected: Expanded raw thinking output by default | too noisy for normal assistant responses
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: Keep thinking blocks structurally separate from answer text unless the upstream API contract changes
Tested: cargo fmt --all; cargo clippy --workspace --all-targets -- -D warnings; cargo test -q
Not-tested: Live upstream thinking payloads against the production API contract
The Rust CLI now recognizes explicit local image references in prompt text,
encodes supported image files as base64, and serializes mixed text/image
content blocks for the API. The request conversion path was kept narrow so
existing runtime/session structures remain stable while prompt mode and user
text conversion gain multimodal support.
Constraint: Must support PNG, JPG/JPEG, GIF, and WebP without adding broad runtime abstractions
Constraint: Existing text-only prompt behavior and API tool flows must keep working unchanged
Rejected: Add only explicit --image CLI flags | does not satisfy auto-detect image refs in prompt text
Rejected: Persist native image blocks in runtime session model | broader refactor than needed for prompt support
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep image parsing scoped to outbound user prompt adaptation unless session persistence truly needs multimodal history
Tested: cargo fmt --all; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: Live remote multimodal request against Anthropic API