SWE-bench
Definition: SWE-bench is a benchmark that evaluates a model's ability to solve real software engineering issues drawn from open-source repositories.
The model must produce a patch that passes the project's tests, measuring realistic coding skill. It is often used to compare development-focused models.