Claude model fundamentals

20% of the exam

Model families, context windows, cost/latency/quality trade-offs and choosing the right model.

The Claude family

Three tiers: Opus (most capable, complex reasoning), Sonnet (balanced cost/performance), Haiku (fast and cheap).
Pick by task: Haiku for high-volume classification/extraction, Sonnet for most agentic work, Opus for hard reasoning and complex code.
Architect's instinct: start small (Haiku/Sonnet) and only escalate to Opus when quality demands it.

Large context window (up to 200K tokens, more on some versions) — input + output share that budget.
More context ≠ better answer: noise degrades quality. Include only what's relevant.
1 token ≈ ~4 characters; you pay for input AND output tokens (output is pricier).

Cost, latency and quality form a triangle: optimize for the use case.
Levers: model choice, context size, prompt caching, Batch API (async, cheaper), streaming (perceived latency).

0/10 answered

1. An app must classify 2M tickets/day with simple logic. Which model by default?
2. A colleague wants to 'stuff' 200K tokens of context to maximize quality. Best response?
3. Which mechanism most reduces cost for a non-urgent batch of millions of requests?
4. Why does output usually cost more than input?
5. A multi-step reasoning task on complex code fails with Haiku. Which escalation?
6. A task requires analyzing images (screenshots). Which Claude capability?
7. For a tight budget while keeping good quality on varied tasks, best starting model?
8. Does streaming change a request's cost?
9. On a very long document, where to put the question for best results?
10. Reasonable token estimate for budgeting in English?