Spaces:
Sleeping
Sleeping
Fill out README. Update pyproject.toml.
Browse files- README.md +106 -18
- pyproject.toml +10 -2
- timebench_eval.py +31 -38
- uv.lock +11 -13
README.md
CHANGED
|
@@ -1,50 +1,138 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
datasets:
|
| 4 |
- TimeBench
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
-
|
|
|
|
| 9 |
sdk: gradio
|
| 10 |
sdk_version: 6.3.0
|
| 11 |
app_file: app.py
|
| 12 |
pinned: false
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# Metric Card for
|
| 16 |
-
|
| 17 |
-
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
| 18 |
|
| 19 |
## Metric Description
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## How to Use
|
| 23 |
-
*Give general statement of how to use the metric*
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
### Inputs
|
| 28 |
-
|
| 29 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
### Output Values
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
#### Values from Popular Papers
|
| 38 |
-
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
|
| 39 |
|
| 40 |
-
|
| 41 |
-
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
|
| 42 |
|
| 43 |
## Limitations and Bias
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
## Citation
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## Further References
|
| 50 |
-
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: TimeBench Eval
|
| 3 |
datasets:
|
| 4 |
- TimeBench
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
+
- temporal reasoning
|
| 9 |
+
description: Evaluation metric for the TimeBench temporal reasoning benchmark by Chu et al. (2023).
|
| 10 |
sdk: gradio
|
| 11 |
sdk_version: 6.3.0
|
| 12 |
app_file: app.py
|
| 13 |
pinned: false
|
| 14 |
+
emoji: ⏰
|
| 15 |
+
colorFrom: purple
|
| 16 |
+
colorTo: red
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# Metric Card for TimeBench Eval
|
|
|
|
|
|
|
| 20 |
|
| 21 |
## Metric Description
|
| 22 |
+
|
| 23 |
+
This metric is designed for the **TimeBench** benchmark (Chu et al., 2023), which evaluates temporal reasoning abilities in large language models. It uses modified prompts from the [ADeLe paper by Zhou et al., 2025](https://arxiv.org/abs/2503.06378). It supports multiple task types with different evaluation strategies:
|
| 24 |
+
|
| 25 |
+
- **TempReason, TimeQA, MenatQA**: Uses SQuAD-style exact match and F1 scoring
|
| 26 |
+
- **Date Arithmetic**: Parses and compares dates for exact match
|
| 27 |
+
- **TimeDial**: Set-based comparison of selected multiple-choice options (A-D)
|
| 28 |
+
|
| 29 |
+
The metric expects model outputs to contain the answer in the format `"Thus, the correct answer is: <answer>"`.
|
| 30 |
+
|
| 31 |
+
It performs the following steps:
|
| 32 |
+
|
| 33 |
+
1. Extracts the answer from the model's prediction string using the marker `"Thus, the correct answer is:"`.
|
| 34 |
+
2. Applies task-specific evaluation:
|
| 35 |
+
- For QA tasks: Computes SQuAD exact match and F1 scores
|
| 36 |
+
- For Date Arithmetic: Parses dates and compares them (day is normalized to 1)
|
| 37 |
+
- For TimeDial: Extracts option letters (A-D) and computes set-based exact match and F1
|
| 38 |
|
| 39 |
## How to Use
|
|
|
|
| 40 |
|
| 41 |
+
You can load the metric using the `evaluate` library:
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
import evaluate
|
| 45 |
+
|
| 46 |
+
metric = evaluate.load("aauss/timebench_eval")
|
| 47 |
+
|
| 48 |
+
# Example for Date Arithmetic task
|
| 49 |
+
predictions = [
|
| 50 |
+
"Let me solve this step by step... Thus, the correct answer is: Aug, 1987.",
|
| 51 |
+
"Calculating the date... Thus, the correct answer is: January 2020.",
|
| 52 |
+
]
|
| 53 |
+
references = ["Aug, 1987", "Feb, 2020"]
|
| 54 |
+
|
| 55 |
+
result = metric.compute(
|
| 56 |
+
predictions=predictions,
|
| 57 |
+
references=references,
|
| 58 |
+
task="Date Arithmetic",
|
| 59 |
+
)
|
| 60 |
+
print(result)
|
| 61 |
+
>>> {"exact_match": [1, 0]}
|
| 62 |
+
|
| 63 |
+
# Example for TempReason/TimeQA/MenatQA tasks
|
| 64 |
+
predictions = [
|
| 65 |
+
"Based on the context... Thus, the correct answer is: Cardiff City.",
|
| 66 |
+
"The answer cannot be determined. Thus, the correct answer is: unanswerable",
|
| 67 |
+
]
|
| 68 |
+
references = ["Cardiff City", "unanswerable"]
|
| 69 |
+
|
| 70 |
+
result = metric.compute(
|
| 71 |
+
predictions=predictions,
|
| 72 |
+
references=references,
|
| 73 |
+
task="MenatQA",
|
| 74 |
+
)
|
| 75 |
+
print(result)
|
| 76 |
+
>>> {"exact_match": [1.0, 1.0], "f1": [1.0, 1.0]}
|
| 77 |
+
|
| 78 |
+
# Example for TimeDial task (multiple choice)
|
| 79 |
+
predictions = [
|
| 80 |
+
"Options B and C are correct. Thus, the correct answer is: B, C.",
|
| 81 |
+
]
|
| 82 |
+
references = ["B. No more than ten minutes && C. No more than five minutes"]
|
| 83 |
+
|
| 84 |
+
result = metric.compute(
|
| 85 |
+
predictions=predictions,
|
| 86 |
+
references=references,
|
| 87 |
+
task="TimeDial",
|
| 88 |
+
)
|
| 89 |
+
print(result)
|
| 90 |
+
>>> {"exact_match": [1], "f1": [1.0]}
|
| 91 |
+
```
|
| 92 |
|
| 93 |
### Inputs
|
| 94 |
+
|
| 95 |
+
- **predictions** (`list` of `str`): List of predictions to score. Each prediction should be a string containing the model's response, which must include the answer after the marker `"Thus, the correct answer is:"`.
|
| 96 |
+
- **references** (`list` of `str`): List of reference answers.
|
| 97 |
+
- **task** (`str`): The task type being evaluated. Must be one of:
|
| 98 |
+
- `"TempReason"`: Temporal reasoning QA
|
| 99 |
+
- `"TimeQA"`: Time-based QA
|
| 100 |
+
- `"MenatQA"`: Multiple Sensitive Factors Time QA
|
| 101 |
+
- `"Date Arithmetic"`: Date calculation tasks
|
| 102 |
+
- `"TimeDial"`: Dialogue-based temporal multiple choice
|
| 103 |
|
| 104 |
### Output Values
|
| 105 |
|
| 106 |
+
The metric returns a dictionary with the following keys (depending on task):
|
| 107 |
|
| 108 |
+
- **exact_match** (`list` of `float` or `int`): Exact match scores for each prediction (0 or 1).
|
| 109 |
+
- **f1** (`list` of `float`): F1 scores for each prediction (0.0 to 1.0). Returned for all tasks except Date Arithmetic.
|
| 110 |
+
|
| 111 |
+
Scores range from 0.0 to 1.0, with higher values indicating better performance.
|
| 112 |
|
| 113 |
#### Values from Popular Papers
|
|
|
|
| 114 |
|
| 115 |
+
Refer to the [original TimeBench paper](https://arxiv.org/abs/2311.17667) for baseline performance values across various language models.
|
|
|
|
| 116 |
|
| 117 |
## Limitations and Bias
|
| 118 |
+
|
| 119 |
+
- The metric relies on the marker `"Thus, the correct answer is:"` to extract answers. If the model output does not follow this exact format, extraction will fail and return `None`.
|
| 120 |
+
- For Date Arithmetic, dates are parsed using `dateutil.parser` with day normalized to 1. Unparseable dates will result in `None` comparisons.
|
| 121 |
+
- For TimeDial, only options A-D are recognized. The extraction looks for standalone letters at word boundaries.
|
| 122 |
+
- The metric assumes predictions and references are properly aligned (same length lists).
|
| 123 |
|
| 124 |
## Citation
|
| 125 |
+
|
| 126 |
+
```bibtex
|
| 127 |
+
@software{abbood2026timebench_eval,
|
| 128 |
+
title={TimeBench Eval},
|
| 129 |
+
author={Abbood, Auss},
|
| 130 |
+
year={2026},
|
| 131 |
+
url={https://huggingface.co/spaces/aauss/timebench_eval}
|
| 132 |
+
}
|
| 133 |
+
```
|
| 134 |
|
| 135 |
## Further References
|
| 136 |
+
|
| 137 |
+
- [TimeBench Paper](https://arxiv.org/abs/2311.17667)
|
| 138 |
+
- [ADeLe paper which adopts TimeBench](https://huggingface.co/datasets/CFI-Kinds-of-Intelligence/ADeLe_battery_v1dot0)
|
pyproject.toml
CHANGED
|
@@ -1,13 +1,21 @@
|
|
| 1 |
[project]
|
| 2 |
name = "timebench-eval"
|
| 3 |
version = "0.1.0"
|
| 4 |
-
description = "
|
| 5 |
readme = "README.md"
|
| 6 |
requires-python = ">=3.12"
|
| 7 |
dependencies = [
|
| 8 |
"evaluate==0.4.6",
|
| 9 |
"huggingface-hub<0.25",
|
| 10 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"pytest>=9.0.2",
|
| 12 |
"ruff>=0.14.11",
|
| 13 |
]
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
[project]
|
| 2 |
name = "timebench-eval"
|
| 3 |
version = "0.1.0"
|
| 4 |
+
description = "Evaluation metric for the TimeBench temporal reasoning benchmark"
|
| 5 |
readme = "README.md"
|
| 6 |
requires-python = ">=3.12"
|
| 7 |
dependencies = [
|
| 8 |
"evaluate==0.4.6",
|
| 9 |
"huggingface-hub<0.25",
|
| 10 |
+
"python-dateutil>=2.8",
|
| 11 |
+
"datasets>=2.0",
|
| 12 |
+
]
|
| 13 |
+
|
| 14 |
+
[project.optional-dependencies]
|
| 15 |
+
dev = [
|
| 16 |
"pytest>=9.0.2",
|
| 17 |
"ruff>=0.14.11",
|
| 18 |
]
|
| 19 |
+
|
| 20 |
+
[tool.pytest.ini_options]
|
| 21 |
+
pythonpath = ["."]
|
timebench_eval.py
CHANGED
|
@@ -11,86 +11,75 @@
|
|
| 11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 12 |
# See the License for the specific language governing permissions and
|
| 13 |
# limitations under the License.
|
| 14 |
-
"""
|
| 15 |
|
| 16 |
import re
|
| 17 |
from datetime import datetime
|
| 18 |
|
|
|
|
|
|
|
| 19 |
from dateutil import parser
|
| 20 |
from dateutil.parser import ParserError
|
| 21 |
|
| 22 |
-
import evaluate
|
| 23 |
-
import datasets
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
# TODO: Add BibTeX citation
|
| 27 |
_CITATION = """\
|
| 28 |
-
@
|
| 29 |
-
title
|
| 30 |
-
|
| 31 |
-
year={
|
|
|
|
| 32 |
}
|
| 33 |
"""
|
| 34 |
|
| 35 |
-
# TODO: Add description of the module here
|
| 36 |
_DESCRIPTION = """\
|
| 37 |
-
|
|
|
|
|
|
|
| 38 |
"""
|
| 39 |
|
| 40 |
|
| 41 |
-
# TODO: Add description of the arguments of the module here
|
| 42 |
_KWARGS_DESCRIPTION = """
|
| 43 |
-
Calculates
|
| 44 |
Args:
|
| 45 |
-
predictions: list of
|
| 46 |
-
should
|
| 47 |
-
references: list of reference
|
| 48 |
-
|
| 49 |
Returns:
|
| 50 |
-
|
| 51 |
-
|
| 52 |
Examples:
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
>>>
|
| 57 |
-
>>> results = my_new_module.compute(references=[0, 1], predictions=[0, 1])
|
| 58 |
>>> print(results)
|
| 59 |
-
{'
|
| 60 |
"""
|
| 61 |
|
| 62 |
-
# TODO: Define external resources urls if needed
|
| 63 |
-
BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
|
| 64 |
-
|
| 65 |
|
| 66 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
| 67 |
class TimebenchEval(evaluate.Metric):
|
| 68 |
-
"""
|
| 69 |
|
| 70 |
def __init__(self, *args, **kwargs):
|
| 71 |
super().__init__(*args, **kwargs)
|
| 72 |
self.squad_metric = evaluate.load("squad")
|
| 73 |
|
| 74 |
def _info(self):
|
| 75 |
-
# TODO: Specifies the evaluate.EvaluationModuleInfo object
|
| 76 |
return evaluate.MetricInfo(
|
| 77 |
-
# This is the description that will appear on the modules page.
|
| 78 |
module_type="metric",
|
| 79 |
description=_DESCRIPTION,
|
| 80 |
citation=_CITATION,
|
| 81 |
inputs_description=_KWARGS_DESCRIPTION,
|
| 82 |
-
# This defines the format of each prediction and reference
|
| 83 |
features=datasets.Features(
|
| 84 |
{
|
| 85 |
"predictions": datasets.Value("string"),
|
| 86 |
"references": datasets.Value("string"),
|
| 87 |
}
|
| 88 |
),
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
codebase_urls=["http://github.com/path/to/codebase/of/new_module"],
|
| 93 |
-
reference_urls=["http://path.to.reference.url/new_module"],
|
| 94 |
)
|
| 95 |
|
| 96 |
def _compute(
|
|
@@ -117,6 +106,10 @@ class TimebenchEval(evaluate.Metric):
|
|
| 117 |
return self._compare_dates(predictions, references)
|
| 118 |
elif task == "TimeDial":
|
| 119 |
return self._compute_timedial(predictions, references)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
@staticmethod
|
| 122 |
def _extract_answer(response: str) -> str | None:
|
|
|
|
| 11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 12 |
# See the License for the specific language governing permissions and
|
| 13 |
# limitations under the License.
|
| 14 |
+
"""Evaluation metric for the TimeBench temporal reasoning benchmark."""
|
| 15 |
|
| 16 |
import re
|
| 17 |
from datetime import datetime
|
| 18 |
|
| 19 |
+
import datasets
|
| 20 |
+
import evaluate
|
| 21 |
from dateutil import parser
|
| 22 |
from dateutil.parser import ParserError
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
_CITATION = """\
|
| 25 |
+
@software{abbood2026timebench_eval,
|
| 26 |
+
title={TimeBench Eval},
|
| 27 |
+
author={Abbood, Auss},
|
| 28 |
+
year={2026},
|
| 29 |
+
url={https://huggingface.co/spaces/aauss/timebench_eval}
|
| 30 |
}
|
| 31 |
"""
|
| 32 |
|
|
|
|
| 33 |
_DESCRIPTION = """\
|
| 34 |
+
Evaluation metric for the TimeBench benchmark, which assesses temporal reasoning
|
| 35 |
+
abilities in large language models. Supports multiple task types including TempReason,
|
| 36 |
+
TimeQA, MenatQA, Date Arithmetic, and TimeDial.
|
| 37 |
"""
|
| 38 |
|
| 39 |
|
|
|
|
| 40 |
_KWARGS_DESCRIPTION = """
|
| 41 |
+
Calculates evaluation metrics for temporal reasoning tasks.
|
| 42 |
Args:
|
| 43 |
+
predictions: list of prediction strings from the model. Each prediction
|
| 44 |
+
should contain the marker "Thus, the correct answer is:" followed by the answer.
|
| 45 |
+
references: list of reference answer strings.
|
| 46 |
+
task: the task type, one of "TempReason", "TimeQA", "MenatQA", "Date Arithmetic", or "TimeDial".
|
| 47 |
Returns:
|
| 48 |
+
exact_match: list of exact match scores (0 or 1) for each prediction.
|
| 49 |
+
f1: list of F1 scores for each prediction (for applicable tasks).
|
| 50 |
Examples:
|
| 51 |
+
>>> timebench_eval = evaluate.load("aauss/timebench_eval")
|
| 52 |
+
>>> predictions = ["Let me think... Thus, the correct answer is: Aug, 1987."]
|
| 53 |
+
>>> references = ["Aug, 1987"]
|
| 54 |
+
>>> results = timebench_eval.compute(predictions=predictions, references=references, task="Date Arithmetic")
|
|
|
|
| 55 |
>>> print(results)
|
| 56 |
+
{'exact_match': [1]}
|
| 57 |
"""
|
| 58 |
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
| 61 |
class TimebenchEval(evaluate.Metric):
|
| 62 |
+
"""Evaluation metric for TimeBench temporal reasoning tasks."""
|
| 63 |
|
| 64 |
def __init__(self, *args, **kwargs):
|
| 65 |
super().__init__(*args, **kwargs)
|
| 66 |
self.squad_metric = evaluate.load("squad")
|
| 67 |
|
| 68 |
def _info(self):
|
|
|
|
| 69 |
return evaluate.MetricInfo(
|
|
|
|
| 70 |
module_type="metric",
|
| 71 |
description=_DESCRIPTION,
|
| 72 |
citation=_CITATION,
|
| 73 |
inputs_description=_KWARGS_DESCRIPTION,
|
|
|
|
| 74 |
features=datasets.Features(
|
| 75 |
{
|
| 76 |
"predictions": datasets.Value("string"),
|
| 77 |
"references": datasets.Value("string"),
|
| 78 |
}
|
| 79 |
),
|
| 80 |
+
homepage="https://huggingface.co/spaces/aauss/timebench_eval",
|
| 81 |
+
codebase_urls=["https://huggingface.co/spaces/aauss/timebench_eval/tree/main"],
|
| 82 |
+
reference_urls=["https://huggingface.co/datasets/ulab-ai/Time-Bench"],
|
|
|
|
|
|
|
| 83 |
)
|
| 84 |
|
| 85 |
def _compute(
|
|
|
|
| 106 |
return self._compare_dates(predictions, references)
|
| 107 |
elif task == "TimeDial":
|
| 108 |
return self._compute_timedial(predictions, references)
|
| 109 |
+
else:
|
| 110 |
+
raise ValueError(
|
| 111 |
+
f"Unknown task: {task}. Expected one of: TempReason, TimeQA, MenatQA, Date Arithmetic, TimeDial"
|
| 112 |
+
)
|
| 113 |
|
| 114 |
@staticmethod
|
| 115 |
def _extract_answer(response: str) -> str | None:
|
uv.lock
CHANGED
|
@@ -628,15 +628,6 @@ wheels = [
|
|
| 628 |
{ url = "https://files.pythonhosted.org/packages/70/44/5191d2e4026f86a2a109053e194d3ba7a31a2d10a9c2348368c63ed4e85a/pandas-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:3869faf4bd07b3b66a9f462417d0ca3a9df29a9f6abd5d0d0dbab15dac7abe87", size = 13202175, upload-time = "2025-09-29T23:31:59.173Z" },
|
| 629 |
]
|
| 630 |
|
| 631 |
-
[[package]]
|
| 632 |
-
name = "pip"
|
| 633 |
-
version = "25.3"
|
| 634 |
-
source = { registry = "https://pypi.org/simple" }
|
| 635 |
-
sdist = { url = "https://files.pythonhosted.org/packages/fe/6e/74a3f0179a4a73a53d66ce57fdb4de0080a8baa1de0063de206d6167acc2/pip-25.3.tar.gz", hash = "sha256:8d0538dbbd7babbd207f261ed969c65de439f6bc9e5dbd3b3b9a77f25d95f343", size = 1803014, upload-time = "2025-10-25T00:55:41.394Z" }
|
| 636 |
-
wheels = [
|
| 637 |
-
{ url = "https://files.pythonhosted.org/packages/44/3c/d717024885424591d5376220b5e836c2d5293ce2011523c9de23ff7bf068/pip-25.3-py3-none-any.whl", hash = "sha256:9655943313a94722b7774661c21049070f6bbb0a1516bf02f7c8d5d9201514cd", size = 1778622, upload-time = "2025-10-25T00:55:39.247Z" },
|
| 638 |
-
]
|
| 639 |
-
|
| 640 |
[[package]]
|
| 641 |
name = "pluggy"
|
| 642 |
version = "1.6.0"
|
|
@@ -920,21 +911,28 @@ name = "timebench-eval"
|
|
| 920 |
version = "0.1.0"
|
| 921 |
source = { virtual = "." }
|
| 922 |
dependencies = [
|
|
|
|
| 923 |
{ name = "evaluate" },
|
| 924 |
{ name = "huggingface-hub" },
|
| 925 |
-
{ name = "
|
|
|
|
|
|
|
|
|
|
|
|
|
| 926 |
{ name = "pytest" },
|
| 927 |
{ name = "ruff" },
|
| 928 |
]
|
| 929 |
|
| 930 |
[package.metadata]
|
| 931 |
requires-dist = [
|
|
|
|
| 932 |
{ name = "evaluate", specifier = "==0.4.6" },
|
| 933 |
{ name = "huggingface-hub", specifier = "<0.25" },
|
| 934 |
-
{ name = "
|
| 935 |
-
{ name = "
|
| 936 |
-
{ name = "ruff", specifier = ">=0.14.11" },
|
| 937 |
]
|
|
|
|
| 938 |
|
| 939 |
[[package]]
|
| 940 |
name = "tqdm"
|
|
|
|
| 628 |
{ url = "https://files.pythonhosted.org/packages/70/44/5191d2e4026f86a2a109053e194d3ba7a31a2d10a9c2348368c63ed4e85a/pandas-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:3869faf4bd07b3b66a9f462417d0ca3a9df29a9f6abd5d0d0dbab15dac7abe87", size = 13202175, upload-time = "2025-09-29T23:31:59.173Z" },
|
| 629 |
]
|
| 630 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 631 |
[[package]]
|
| 632 |
name = "pluggy"
|
| 633 |
version = "1.6.0"
|
|
|
|
| 911 |
version = "0.1.0"
|
| 912 |
source = { virtual = "." }
|
| 913 |
dependencies = [
|
| 914 |
+
{ name = "datasets" },
|
| 915 |
{ name = "evaluate" },
|
| 916 |
{ name = "huggingface-hub" },
|
| 917 |
+
{ name = "python-dateutil" },
|
| 918 |
+
]
|
| 919 |
+
|
| 920 |
+
[package.optional-dependencies]
|
| 921 |
+
dev = [
|
| 922 |
{ name = "pytest" },
|
| 923 |
{ name = "ruff" },
|
| 924 |
]
|
| 925 |
|
| 926 |
[package.metadata]
|
| 927 |
requires-dist = [
|
| 928 |
+
{ name = "datasets", specifier = ">=2.0" },
|
| 929 |
{ name = "evaluate", specifier = "==0.4.6" },
|
| 930 |
{ name = "huggingface-hub", specifier = "<0.25" },
|
| 931 |
+
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=9.0.2" },
|
| 932 |
+
{ name = "python-dateutil", specifier = ">=2.8" },
|
| 933 |
+
{ name = "ruff", marker = "extra == 'dev'", specifier = ">=0.14.11" },
|
| 934 |
]
|
| 935 |
+
provides-extras = ["dev"]
|
| 936 |
|
| 937 |
[[package]]
|
| 938 |
name = "tqdm"
|