Laborator commited on
Commit
771bc66
·
verified ·
1 Parent(s): c3ae39d

sync benchmark reports

Browse files
benchmarks/LICENSE-DATA.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dataset attribution and licensing
2
+
3
+ QVerify benchmark runs ship with the following third-party datasets.
4
+ Each dataset retains its upstream license; the harness that consumes
5
+ them (the `qverify.eval` module) is Apache 2.0.
6
+
7
+ ## ProofWriter
8
+
9
+ - **License:** Apache 2.0
10
+ - **Source:** https://allenai.org/data/proofwriter
11
+ - **Citation:** Tafjord, O., Dalvi, B., & Clark, P. (2021). ProofWriter:
12
+ Generating Implications, Proofs, and Abductive Statements over Natural
13
+ Language. *Findings of ACL 2021*.
14
+
15
+ ## RuleTaker
16
+
17
+ - **License:** CC BY 4.0
18
+ - **Source:** https://allenai.org/data/ruletaker
19
+ - **Citation:** Clark, P., Tafjord, O., & Richardson, K. (2020).
20
+ Transformers as Soft Reasoners over Language. *IJCAI 2020*.
21
+
22
+ ## Note about FOLIO
23
+
24
+ FOLIO is intentionally not included in v0.1 because its CC BY-SA 4.0
25
+ license requires derivative benchmark reports to also be released under
26
+ CC BY-SA, which conflicts with this project's Apache 2.0 default. The
27
+ constraint applies to any published artifact (JSON reports, PNG charts,
28
+ README tables) that incorporates FOLIO outputs. Including it would force
29
+ the entire benchmark report into CC BY-SA, including the parts derived
30
+ from ProofWriter (Apache 2.0) and RuleTaker (CC BY 4.0). This may be
31
+ revisited in a future release if a separate publishing path is set up
32
+ for FOLIO results.
33
+
34
+ `qverify.eval.datasets.load_folio` raises `NotImplementedError` to make
35
+ the exclusion explicit at the call site.
benchmarks/results/.gitkeep ADDED
File without changes
benchmarks/results/proofwriter_smoke/accuracy.png ADDED
benchmarks/results/proofwriter_smoke/latency.png ADDED
benchmarks/results/proofwriter_smoke/qubits.png ADDED
benchmarks/results/proofwriter_smoke/report.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset": "proofwriter",
3
+ "backend": "default.qubit",
4
+ "n_examples": 0,
5
+ "n_skipped": 7,
6
+ "accuracy": 0.0,
7
+ "avg_seconds": 0.0,
8
+ "p95_seconds": 0.0,
9
+ "n_translated": 3,
10
+ "n_translation_failed": 7,
11
+ "avg_translation_seconds": 54.93012110267106,
12
+ "n_skipped_too_large": 3,
13
+ "results": []
14
+ }
benchmarks/results/ruletaker_smoke/accuracy.png ADDED
benchmarks/results/ruletaker_smoke/latency.png ADDED
benchmarks/results/ruletaker_smoke/qubits.png ADDED
benchmarks/results/ruletaker_smoke/report.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset": "ruletaker",
3
+ "backend": "default.qubit",
4
+ "n_examples": 0,
5
+ "n_skipped": 0,
6
+ "accuracy": 0.0,
7
+ "avg_seconds": 0.0,
8
+ "p95_seconds": 0.0,
9
+ "n_translated": 10,
10
+ "n_translation_failed": 0,
11
+ "avg_translation_seconds": 5.931748511801561,
12
+ "n_skipped_too_large": 10,
13
+ "results": []
14
+ }