Upload 2 files

Files changed (3) hide show

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+model_comparison_aaai.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -35,6 +35,17 @@ A 30B-parameter instruction-tuned language model optimized for reasoning, math,
 | **Vocab size** | 64,000 |
 | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) |
 ## What is ADS?
 **Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.

 | **Vocab size** | 64,000 |
 | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) |
+## Benchmark Results (5-shot, acc_norm)
+| Benchmark | Kai-30B-Instruct | Llama-3 70B | Qwen2.5 32B | Yi-34B | Llama-3 8B | Mistral 7B | Llama-2 7B |
+|-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| **ARC-C** | 64.0 | 83.0 | 70.5 | 65.3 | 60.1 | 55.5 | 53.0 |
+| **HellaSwag** | 74.4 | 89.0 | 85.2 | 83.1 | 78.6 | 81.3 | 78.6 |
+| **PIQA** | 84.8 | 85.0 | 84.1 | 82.5 | 79.8 | 82.1 | 78.1 |
+| **Winogrande** | **86.4** | 83.0 | 78.2 | 76.4 | 73.0 | 74.0 | 69.1 |
+![Benchmark Comparison](model_comparison_aaai.png)
 ## What is ADS?
 **Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.

model_comparison_aaai.png ADDED Viewed