sync benchmark reports
Browse files- benchmarks/LICENSE-DATA.md +35 -0
- benchmarks/results/.gitkeep +0 -0
- benchmarks/results/proofwriter_smoke/accuracy.png +0 -0
- benchmarks/results/proofwriter_smoke/latency.png +0 -0
- benchmarks/results/proofwriter_smoke/qubits.png +0 -0
- benchmarks/results/proofwriter_smoke/report.json +14 -0
- benchmarks/results/ruletaker_smoke/accuracy.png +0 -0
- benchmarks/results/ruletaker_smoke/latency.png +0 -0
- benchmarks/results/ruletaker_smoke/qubits.png +0 -0
- benchmarks/results/ruletaker_smoke/report.json +14 -0
benchmarks/LICENSE-DATA.md
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dataset attribution and licensing
|
| 2 |
+
|
| 3 |
+
QVerify benchmark runs ship with the following third-party datasets.
|
| 4 |
+
Each dataset retains its upstream license; the harness that consumes
|
| 5 |
+
them (the `qverify.eval` module) is Apache 2.0.
|
| 6 |
+
|
| 7 |
+
## ProofWriter
|
| 8 |
+
|
| 9 |
+
- **License:** Apache 2.0
|
| 10 |
+
- **Source:** https://allenai.org/data/proofwriter
|
| 11 |
+
- **Citation:** Tafjord, O., Dalvi, B., & Clark, P. (2021). ProofWriter:
|
| 12 |
+
Generating Implications, Proofs, and Abductive Statements over Natural
|
| 13 |
+
Language. *Findings of ACL 2021*.
|
| 14 |
+
|
| 15 |
+
## RuleTaker
|
| 16 |
+
|
| 17 |
+
- **License:** CC BY 4.0
|
| 18 |
+
- **Source:** https://allenai.org/data/ruletaker
|
| 19 |
+
- **Citation:** Clark, P., Tafjord, O., & Richardson, K. (2020).
|
| 20 |
+
Transformers as Soft Reasoners over Language. *IJCAI 2020*.
|
| 21 |
+
|
| 22 |
+
## Note about FOLIO
|
| 23 |
+
|
| 24 |
+
FOLIO is intentionally not included in v0.1 because its CC BY-SA 4.0
|
| 25 |
+
license requires derivative benchmark reports to also be released under
|
| 26 |
+
CC BY-SA, which conflicts with this project's Apache 2.0 default. The
|
| 27 |
+
constraint applies to any published artifact (JSON reports, PNG charts,
|
| 28 |
+
README tables) that incorporates FOLIO outputs. Including it would force
|
| 29 |
+
the entire benchmark report into CC BY-SA, including the parts derived
|
| 30 |
+
from ProofWriter (Apache 2.0) and RuleTaker (CC BY 4.0). This may be
|
| 31 |
+
revisited in a future release if a separate publishing path is set up
|
| 32 |
+
for FOLIO results.
|
| 33 |
+
|
| 34 |
+
`qverify.eval.datasets.load_folio` raises `NotImplementedError` to make
|
| 35 |
+
the exclusion explicit at the call site.
|
benchmarks/results/.gitkeep
ADDED
|
File without changes
|
benchmarks/results/proofwriter_smoke/accuracy.png
ADDED
|
benchmarks/results/proofwriter_smoke/latency.png
ADDED
|
benchmarks/results/proofwriter_smoke/qubits.png
ADDED
|
benchmarks/results/proofwriter_smoke/report.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"dataset": "proofwriter",
|
| 3 |
+
"backend": "default.qubit",
|
| 4 |
+
"n_examples": 0,
|
| 5 |
+
"n_skipped": 7,
|
| 6 |
+
"accuracy": 0.0,
|
| 7 |
+
"avg_seconds": 0.0,
|
| 8 |
+
"p95_seconds": 0.0,
|
| 9 |
+
"n_translated": 3,
|
| 10 |
+
"n_translation_failed": 7,
|
| 11 |
+
"avg_translation_seconds": 54.93012110267106,
|
| 12 |
+
"n_skipped_too_large": 3,
|
| 13 |
+
"results": []
|
| 14 |
+
}
|
benchmarks/results/ruletaker_smoke/accuracy.png
ADDED
|
benchmarks/results/ruletaker_smoke/latency.png
ADDED
|
benchmarks/results/ruletaker_smoke/qubits.png
ADDED
|
benchmarks/results/ruletaker_smoke/report.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"dataset": "ruletaker",
|
| 3 |
+
"backend": "default.qubit",
|
| 4 |
+
"n_examples": 0,
|
| 5 |
+
"n_skipped": 0,
|
| 6 |
+
"accuracy": 0.0,
|
| 7 |
+
"avg_seconds": 0.0,
|
| 8 |
+
"p95_seconds": 0.0,
|
| 9 |
+
"n_translated": 10,
|
| 10 |
+
"n_translation_failed": 0,
|
| 11 |
+
"avg_translation_seconds": 5.931748511801561,
|
| 12 |
+
"n_skipped_too_large": 10,
|
| 13 |
+
"results": []
|
| 14 |
+
}
|