Claude model fundamentals
20% of the examModel families, context windows, cost/latency/quality trade-offs and choosing the right model.
The Claude family
- Three tiers: Opus (most capable, complex reasoning), Sonnet (balanced cost/performance), Haiku (fast and cheap).
- Pick by task: Haiku for high-volume classification/extraction, Sonnet for most agentic work, Opus for hard reasoning and complex code.
- Architect's instinct: start small (Haiku/Sonnet) and only escalate to Opus when quality demands it.
Context window & tokens
- Large context window (up to 200K tokens, more on some versions) — input + output share that budget.
- More context ≠ better answer: noise degrades quality. Include only what's relevant.
- 1 token ≈ ~4 characters; you pay for input AND output tokens (output is pricier).
Architect trade-offs
- Cost, latency and quality form a triangle: optimize for the use case.
- Levers: model choice, context size, prompt caching, Batch API (async, cheaper), streaming (perceived latency).
Practice — 10 questions
- 1. An app must classify 2M tickets/day with simple logic. Which model by default?
- 2. A colleague wants to 'stuff' 200K tokens of context to maximize quality. Best response?
- 3. Which mechanism most reduces cost for a non-urgent batch of millions of requests?
- 4. Why does output usually cost more than input?
- 5. A multi-step reasoning task on complex code fails with Haiku. Which escalation?
- 6. A task requires analyzing images (screenshots). Which Claude capability?
- 7. For a tight budget while keeping good quality on varied tasks, best starting model?
- 8. Does streaming change a request's cost?
- 9. On a very long document, where to put the question for best results?
- 10. Reasonable token estimate for budgeting in English?