mahernaija commited on
Commit
0f64c6d
·
verified ·
1 Parent(s): 5e81299

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +13 -2
README.md CHANGED
@@ -24,11 +24,22 @@ Fine-tuned [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) for **Tex
24
 
25
  | Metric | Base Model | This Model | Improvement |
26
  |---|---|---|---|
27
- | **SQL Execution Accuracy** | 19.5% | **61.0%** | **+41.5%** |
28
- | **Valid SQL Output** | 41.5% | **90.2%** | +48.7% |
 
29
  | **Spider Exact Match** | 0.0% | **22.2%** | +22.2% |
30
  | **Spider Keyword Score** | 45.5% | **85.4%** | +39.9% |
31
 
 
 
 
 
 
 
 
 
 
 
32
  ## Usage
33
 
34
  ```python
 
24
 
25
  | Metric | Base Model | This Model | Improvement |
26
  |---|---|---|---|
27
+ | **Gretel Execution Accuracy (3,492 samples)** | 26.5% | **66.7%** | **+40.2%** |
28
+ | **Gretel Valid SQL** | 37.8% | **89.4%** | +51.6% |
29
+ | **Spider Execution Accuracy (1,032, gold std)** | 46.6% | **55.1%** | +8.5% |
30
  | **Spider Exact Match** | 0.0% | **22.2%** | +22.2% |
31
  | **Spider Keyword Score** | 45.5% | **85.4%** | +39.9% |
32
 
33
+ ### Regression (standard benchmarks via lm-eval-harness)
34
+
35
+ | Benchmark | Base | This Model | Delta |
36
+ |---|---|---|---|
37
+ | **MMLU (humanities)** | 81.9% | 83.5% | +1.6% (no regression) |
38
+ | **MMLU (STEM)** | 87.2% | 86.7% | -0.5% (no regression) |
39
+ | **MMLU (social sciences)** | 92.0% | 92.0% | 0% (no regression) |
40
+ | **MMLU (other)** | 87.5% | 87.5% | 0% (no regression) |
41
+ | **GSM8K (math, strict)** | 60.4% | 35.4% | **-25.0% (regression)** |
42
+
43
  ## Usage
44
 
45
  ```python