Claude model fundamentals

20% of the exam

Model families, context windows, cost/latency/quality trade-offs and choosing the right model.

The Claude family

  • Three tiers: Opus (most capable, complex reasoning), Sonnet (balanced cost/performance), Haiku (fast and cheap).
  • Pick by task: Haiku for high-volume classification/extraction, Sonnet for most agentic work, Opus for hard reasoning and complex code.
  • Architect's instinct: start small (Haiku/Sonnet) and only escalate to Opus when quality demands it.

Context window & tokens

  • Large context window (up to 200K tokens, more on some versions) — input + output share that budget.
  • More context ≠ better answer: noise degrades quality. Include only what's relevant.
  • 1 token ≈ ~4 characters; you pay for input AND output tokens (output is pricier).

Architect trade-offs

  • Cost, latency and quality form a triangle: optimize for the use case.
  • Levers: model choice, context size, prompt caching, Batch API (async, cheaper), streaming (perceived latency).

Practice — 10 questions

0/10 answered
  1. 1. An app must classify 2M tickets/day with simple logic. Which model by default?
  2. 2. A colleague wants to 'stuff' 200K tokens of context to maximize quality. Best response?
  3. 3. Which mechanism most reduces cost for a non-urgent batch of millions of requests?
  4. 4. Why does output usually cost more than input?
  5. 5. A multi-step reasoning task on complex code fails with Haiku. Which escalation?
  6. 6. A task requires analyzing images (screenshots). Which Claude capability?
  7. 7. For a tight budget while keeping good quality on varied tasks, best starting model?
  8. 8. Does streaming change a request's cost?
  9. 9. On a very long document, where to put the question for best results?
  10. 10. Reasonable token estimate for budgeting in English?

← Back to the Academy · Mock exam →