cmboulanger commited on
Commit
f2c1075
·
1 Parent(s): 18a607c

docs: Add evalutation results

Browse files
Files changed (2) hide show
  1. README.md +1 -0
  2. docs/evaluation-results.md +12 -0
README.md CHANGED
@@ -336,3 +336,4 @@ Integration tests are excluded from the default `pytest` run via `pyproject.toml
336
  [tool.pytest.ini_options]
337
  addopts = "-m 'not integration'"
338
  ```
 
 
336
  [tool.pytest.ini_options]
337
  addopts = "-m 'not integration'"
338
  ```
339
+
docs/evaluation-results.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Evaluation results
2
+
3
+ | # | Model | Precision | Recall | F1 (micro) | Time (s) |
4
+ |---|----------------------------------------|-----------|--------|------------|----------|
5
+ | 1 | [gemini] gemini-2.5-flash | 0.932 | 0.919 | 0.925 | 5.1 |
6
+ | 2 | [gemini] gemini-2.5-flash-lite | 0.915 | 0.878 | 0.897 | 3.1 |
7
+ | 3 | [kisski] devstral-2-123b-instruct-2512 | 0.868 | 0.892 | 0.880 | 32.2 |
8
+ | 4 | [kisski] qwen3-coder-30b-a3b-instruct | 0.816 | 0.838 | 0.827 | 12.6 |
9
+ | 5 | [kisski] internvl3.5-30b-a3b | 0.877 | 0.770 | 0.820 | 8.3 |
10
+ | 6 | [kisski] qwen3-vl-30b-a3b-instruct | 0.806 | 0.784 | 0.795 | 11 |
11
+ | 7 | [kisski] qwen3-omni-30b-a3b-instruct | 0.800 | 0.649 | 0.716 | 10.3 |
12
+ | 8 | [kisski] apertus-70b-instruct-2509 | 0.763 | 0.392 | 0.518 | 13.6 |