Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -24,11 +24,22 @@ Fine-tuned [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) for **Tex
|
|
| 24 |
|
| 25 |
| Metric | Base Model | This Model | Improvement |
|
| 26 |
|---|---|---|---|
|
| 27 |
-
| **
|
| 28 |
-
| **Valid SQL
|
|
|
|
| 29 |
| **Spider Exact Match** | 0.0% | **22.2%** | +22.2% |
|
| 30 |
| **Spider Keyword Score** | 45.5% | **85.4%** | +39.9% |
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## Usage
|
| 33 |
|
| 34 |
```python
|
|
|
|
| 24 |
|
| 25 |
| Metric | Base Model | This Model | Improvement |
|
| 26 |
|---|---|---|---|
|
| 27 |
+
| **Gretel Execution Accuracy (3,492 samples)** | 26.5% | **66.7%** | **+40.2%** |
|
| 28 |
+
| **Gretel Valid SQL** | 37.8% | **89.4%** | +51.6% |
|
| 29 |
+
| **Spider Execution Accuracy (1,032, gold std)** | 46.6% | **55.1%** | +8.5% |
|
| 30 |
| **Spider Exact Match** | 0.0% | **22.2%** | +22.2% |
|
| 31 |
| **Spider Keyword Score** | 45.5% | **85.4%** | +39.9% |
|
| 32 |
|
| 33 |
+
### Regression (standard benchmarks via lm-eval-harness)
|
| 34 |
+
|
| 35 |
+
| Benchmark | Base | This Model | Delta |
|
| 36 |
+
|---|---|---|---|
|
| 37 |
+
| **MMLU (humanities)** | 81.9% | 83.5% | +1.6% (no regression) |
|
| 38 |
+
| **MMLU (STEM)** | 87.2% | 86.7% | -0.5% (no regression) |
|
| 39 |
+
| **MMLU (social sciences)** | 92.0% | 92.0% | 0% (no regression) |
|
| 40 |
+
| **MMLU (other)** | 87.5% | 87.5% | 0% (no regression) |
|
| 41 |
+
| **GSM8K (math, strict)** | 60.4% | 35.4% | **-25.0% (regression)** |
|
| 42 |
+
|
| 43 |
## Usage
|
| 44 |
|
| 45 |
```python
|