Messages API, Tool Use & Architecture
20% of the examMessages API, tool use, streaming, error handling and production patterns.
The Messages API
- Conversation = alternating list {role: user|assistant, content}. The system prompt is a separate parameter.
- Key params: model, max_tokens (required), temperature, stop_sequences, system, tools.
- Streaming (SSE) returns token-by-token for perceived latency.
Tool use
- Declare tools (name, description, input_schema JSON Schema). Claude returns a tool_use block.
- Loop: Claude → tool_use → your code runs it → you return tool_result → Claude continues.
- tool_choice forces/frees usage; tools guarantee structured outputs.
Robustness
- Handle 429 (rate limit) with exponential backoff + jitter; 529 (overloaded) with retry.
- Timeouts, request-id logging, graceful degradation, idempotency and caching.
Practice — 10 questions
- 1. Where do role/persistent rules go in the Messages API?
- 2. Correct tool-use cycle?
- 3. Guarantee reliable structured extraction?
- 4. The API returns 429s. Correct strategy?
- 5. Which parameter is required on every call?
- 6. Output is cut and stop_reason is 'max_tokens'. What to do?
- 7. Who is responsible for keeping conversation history?
- 8. Error 529 (overloaded). Appropriate reaction?
- 9. Can Claude request multiple tool calls in a single turn?
- 10. Best approach to reduce a chat's perceived latency?