The following is the DROP-based evaluation dataset for extrapolation. This dataset is designed to evaluate the extrapolation capability of numerical reasoning models. The link below is the leaderboard for the original DROP dataset:

https://allenai.org/data/drop

To put things into perspective, we provide a sample of the original DROP dataset in the figure below (a list of data samples extracted from the original paper):

Untitled

Evaluation Dataset (Perturbated Version)

The extrapolation dataset is divided into the following six separate versions, each of which is delineated by its respective perturbation type.