How to extract structured data with Claude
Turning a messy document into clean data is one of Claude's most rewarding uses: pulling the lines of an invoice, the contacts from a batch of emails, or the key figures from a report, as a reusable table or JSON. The key is not magic, but a precise instruction about the fields you want and systematic verification. Here is how to proceed, from the one-off case to automation.
In short: Give Claude the document and the exact list of fields you want with their format, then ask for a table or JSON with those precise columns or keys. State what to do with missing values (use null, do not invent) and verify critical items against the source. For large volumes, use the API.
Define exactly which fields to extract
First, decide on the output structure. Give Claude the precise list of fields you want and their format: for an invoice, for example, number, date, supplier, lines (description, quantity, unit price), net total, VAT, gross total. An effective instruction looks like: extract this information from the attached invoice and return a table with exactly these columns. The sharper the target, the more usable the result. Also specify the conventions (date format, decimal separator, currency) to avoid ambiguity and get data you can reuse directly.
Choose the right output format
For human use or a copy-paste into a spreadsheet, ask for a table (or a CSV format). To feed another piece of software, ask for JSON, specifying the exact field names: return a JSON object with the keys number, date, gross_total. State how to handle missing values (use null rather than invent) and edge cases (several addresses, duplicate lines). Asking for a strict structured format cuts the downstream cleanup and makes the output predictable, which is essential if you plan to process it automatically.
Make it reliable and verify
Extraction saves a great deal of time, but an error on an amount or a date can be costly. Ask Claude to flag explicitly when a piece of information is missing or unreadable, rather than guessing it. For critical items (totals, identifiers, due dates), verify by sampling against the source, especially at first. On repetitive documents, define a stable instruction and test it on a few varied examples before applying it at scale. The right habit: treat the output as a draft that is 90 percent reliable, to be checked on the 10 percent that matter.
Scale up with the API
For a few documents, the claude.ai interface is enough. To process hundreds regularly, Anthropic's API lets you send the documents programmatically and receive structured JSON directly, which your application stores in a database. The MCP protocol connects Claude to your data sources and tools in a standardised way. That is how you industrialise extraction: a proven instruction, a strict output format, and an automated verification step on sensitive fields. See our API guide and our page on MCP.
Frequently asked questions
How do I extract structured data with Claude?
Give Claude the document and the exact list of fields you want with their format, then ask for a table or JSON with those precise columns or keys. State what to do with missing values (use null, do not invent) and verify critical items against the source. For large volumes, use the API.
Can Claude output clean JSON?
Yes. Explicitly ask for a JSON object, specifying the expected key names and value formats, and tell it to use null for missing fields. A strict format makes the output predictable and reusable by another piece of software, which cuts downstream cleanup.
Is Claude's extraction 100% reliable?
No. It saves a lot of time but can be wrong on an amount, a date or an ambiguous field. Ask Claude to flag missing information rather than guess it, and verify critical items against the source by sampling, especially on new documents.
How do I extract data from many documents automatically?
Use Anthropic's API to send documents programmatically and receive structured JSON that your application stores in a database, and the MCP protocol to connect your sources directly. Define a stable instruction, test it on varied examples, then automate verification of the sensitive fields.
See also: the complete guide to Claude · Claude news in real time
Claude News is an independent publication, not affiliated with Anthropic.