Spaces:
Runtime error
Runtime error
Commit ·
f2c1075
1
Parent(s): 18a607c
docs: Add evalutation results
Browse files- README.md +1 -0
- docs/evaluation-results.md +12 -0
README.md
CHANGED
|
@@ -336,3 +336,4 @@ Integration tests are excluded from the default `pytest` run via `pyproject.toml
|
|
| 336 |
[tool.pytest.ini_options]
|
| 337 |
addopts = "-m 'not integration'"
|
| 338 |
```
|
|
|
|
|
|
| 336 |
[tool.pytest.ini_options]
|
| 337 |
addopts = "-m 'not integration'"
|
| 338 |
```
|
| 339 |
+
|
docs/evaluation-results.md
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Evaluation results
|
| 2 |
+
|
| 3 |
+
| # | Model | Precision | Recall | F1 (micro) | Time (s) |
|
| 4 |
+
|---|----------------------------------------|-----------|--------|------------|----------|
|
| 5 |
+
| 1 | [gemini] gemini-2.5-flash | 0.932 | 0.919 | 0.925 | 5.1 |
|
| 6 |
+
| 2 | [gemini] gemini-2.5-flash-lite | 0.915 | 0.878 | 0.897 | 3.1 |
|
| 7 |
+
| 3 | [kisski] devstral-2-123b-instruct-2512 | 0.868 | 0.892 | 0.880 | 32.2 |
|
| 8 |
+
| 4 | [kisski] qwen3-coder-30b-a3b-instruct | 0.816 | 0.838 | 0.827 | 12.6 |
|
| 9 |
+
| 5 | [kisski] internvl3.5-30b-a3b | 0.877 | 0.770 | 0.820 | 8.3 |
|
| 10 |
+
| 6 | [kisski] qwen3-vl-30b-a3b-instruct | 0.806 | 0.784 | 0.795 | 11 |
|
| 11 |
+
| 7 | [kisski] qwen3-omni-30b-a3b-instruct | 0.800 | 0.649 | 0.716 | 10.3 |
|
| 12 |
+
| 8 | [kisski] apertus-70b-instruct-2509 | 0.763 | 0.392 | 0.518 | 13.6 |
|