Benchmark
Definition: A benchmark is a standardized test set that measures and compares model performance on specific tasks.
It gives a numeric reference but stays partial: a high score does not guarantee real-world performance, and some models can overfit the tests. Read with caution.