Add detailed README with benchmarks, architecture, and dataset cross-links
Browse files
README.md
CHANGED
|
@@ -1,40 +1,100 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
-
- zh
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# SocratTeachLLM
|
| 8 |
|
| 9 |
-
A fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.
|
| 10 |
|
| 11 |
-
> **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM)
|
| 12 |
|
| 13 |
---
|
| 14 |
|
| 15 |
## What It Does
|
| 16 |
|
| 17 |
-
SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate consultant agent (e.g., GPT-4o) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
|
| 18 |
|
| 19 |
-
Teaching proceeds through five stages:
|
| 20 |
|
| 21 |
-
| Stage | Name | Description |
|
| 22 |
-
|------
|
| 23 |
-
|
|
| 24 |
-
|
|
| 25 |
-
|
|
| 26 |
-
|
|
| 27 |
-
|
|
| 28 |
|
| 29 |
-
The model was fine-tuned
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
## Model Architecture
|
| 34 |
|
| 35 |
| Parameter | Value |
|
| 36 |
-
|---
|
| 37 |
| Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
|
|
|
|
| 38 |
| Layers | 40 |
|
| 39 |
| Hidden size | 4,096 |
|
| 40 |
| Attention heads | 32 |
|
|
@@ -44,15 +104,16 @@ The model was fine-tuned (LoRA) on **SocratDataset**: 6,803 multi-turn Socratic
|
|
| 44 |
| Max context length | 131,072 tokens (128K) |
|
| 45 |
| Storage dtype | bfloat16 |
|
| 46 |
| Attention | Multi-query (2 groups), RoPE (ratio 500) |
|
| 47 |
-
| Normalization | RMSNorm
|
| 48 |
-
|
|
| 49 |
-
| Weight files | 4 × safetensors shards (~18.8 GB total) |
|
| 50 |
|
| 51 |
-
**Generation defaults:** temperature 0.8, top-p 0.8
|
| 52 |
|
| 53 |
---
|
| 54 |
|
| 55 |
-
##
|
|
|
|
|
|
|
| 56 |
|
| 57 |
The model uses custom modeling code, so `trust_remote_code=True` is required.
|
| 58 |
|
|
@@ -70,10 +131,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 70 |
trust_remote_code=True,
|
| 71 |
)
|
| 72 |
|
| 73 |
-
messages = [
|
| 74 |
-
{"role": "user", "content": "What do you think causes the seasons to change?"}
|
| 75 |
-
]
|
| 76 |
-
|
| 77 |
inputs = tokenizer.apply_chat_template(
|
| 78 |
messages, add_generation_prompt=True, return_tensors="pt"
|
| 79 |
).to(model.device)
|
|
@@ -82,7 +140,7 @@ outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.8)
|
|
| 82 |
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
|
| 83 |
```
|
| 84 |
|
| 85 |
-
###
|
| 86 |
|
| 87 |
```python
|
| 88 |
from transformers import BitsAndBytesConfig
|
|
@@ -93,7 +151,6 @@ bnb_config = BitsAndBytesConfig(
|
|
| 93 |
bnb_4bit_use_double_quant=True,
|
| 94 |
bnb_4bit_quant_type="nf4",
|
| 95 |
)
|
| 96 |
-
|
| 97 |
model = AutoModelForCausalLM.from_pretrained(
|
| 98 |
model_id,
|
| 99 |
quantization_config=bnb_config,
|
|
@@ -102,50 +159,39 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 102 |
)
|
| 103 |
```
|
| 104 |
|
| 105 |
-
-
|
| 106 |
-
|
| 107 |
-
## Running Locally with Ollama
|
| 108 |
-
|
| 109 |
-
This repo includes a `Modelfile` for Ollama (auto-generated by LlamaFactory). It sets a 4,096-token context window and the correct stop sequences for the ChatGLM4 chat format.
|
| 110 |
|
| 111 |
```bash
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
ollama run SocratTeachLLM
|
| 117 |
```
|
| 118 |
|
| 119 |
-
|
| 120 |
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
### vLLM (full bfloat16, ~19 GB VRAM)
|
| 124 |
|
| 125 |
```bash
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
--dtype bfloat16 \
|
| 129 |
-
--trust-remote-code
|
| 130 |
```
|
| 131 |
|
| 132 |
-
|
| 133 |
|
| 134 |
---
|
| 135 |
|
| 136 |
## Built With This Model
|
| 137 |
|
| 138 |
-
**[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using
|
| 139 |
|
| 140 |
Key integration details:
|
| 141 |
-
|
| 142 |
-
- **
|
| 143 |
-
- **
|
| 144 |
-
- **
|
| 145 |
-
- **API surface:** OpenAI-compatible chat completions endpoint (`TEACHER_MODEL_NAME=SocratTeachLLM`)
|
| 146 |
|
| 147 |
```bash
|
| 148 |
-
# Download the model for use in csen-346
|
| 149 |
hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
|
| 150 |
```
|
| 151 |
|
|
@@ -154,16 +200,18 @@ hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
|
|
| 154 |
## Training Data
|
| 155 |
|
| 156 |
| Property | Value |
|
| 157 |
-
|---
|
| 158 |
-
| Dataset | SocratDataset |
|
| 159 |
| Dialogues | 6,803 |
|
| 160 |
| Turns | 42,000+ |
|
| 161 |
-
| Domain | Elementary school science |
|
| 162 |
-
| Language |
|
| 163 |
-
| Train split |
|
| 164 |
| Test split | 680 dialogues (10%) |
|
| 165 |
| Strategies | 34 SocRule teaching strategies |
|
| 166 |
|
|
|
|
|
|
|
| 167 |
---
|
| 168 |
|
| 169 |
## Citation
|
|
@@ -171,16 +219,30 @@ hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
|
|
| 171 |
If you use this model, please cite the original KELE paper:
|
| 172 |
|
| 173 |
```bibtex
|
| 174 |
-
@inproceedings{
|
| 175 |
-
title = {KELE: A Multi-Agent Framework for Structured
|
| 176 |
author = {Peng, Yuan and others},
|
| 177 |
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
|
| 178 |
year = {2025},
|
|
|
|
| 179 |
}
|
| 180 |
```
|
| 181 |
|
| 182 |
---
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
## License
|
| 185 |
|
| 186 |
[Apache 2.0](LICENSE)
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
+
- zh
|
| 5 |
+
- en
|
| 6 |
+
tags:
|
| 7 |
+
- education
|
| 8 |
+
- socratic-teaching
|
| 9 |
+
- dialogue
|
| 10 |
+
- fine-tuned
|
| 11 |
+
- glm4
|
| 12 |
+
- kele
|
| 13 |
+
- lora
|
| 14 |
+
base_model: THUDM/glm-4-9b-chat
|
| 15 |
---
|
| 16 |
|
| 17 |
# SocratTeachLLM
|
| 18 |
|
| 19 |
+
A LoRA fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.888) (Peng et al., EMNLP 2025 Findings).
|
| 20 |
|
| 21 |
+
> **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM) — this repository is a copy with an expanded README.
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
## What It Does
|
| 26 |
|
| 27 |
+
SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate **consultant agent** (e.g., GPT-4o or Qwen) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
|
| 28 |
|
| 29 |
+
Teaching proceeds through five stages (SocRule):
|
| 30 |
|
| 31 |
+
| Stage | Name | State codes | Description |
|
| 32 |
+
|---|---|---|---|
|
| 33 |
+
| a | Initiation | a1 | Student poses the question; dialogue begins |
|
| 34 |
+
| b | Concept Probing | b2–b7 | Teacher probes prior knowledge and surfaces misconceptions |
|
| 35 |
+
| c | Inductive Reasoning | c8–c29 | Core teaching stage — guides the student toward generalizations; can repeat many turns |
|
| 36 |
+
| d | Answer Derivation | d30–d33 | Help the student arrive at the correct answer |
|
| 37 |
+
| e | Summary | e34 | Consolidate and reinforce learning |
|
| 38 |
|
| 39 |
+
The model was fine-tuned on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics in Chinese.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## Published Performance
|
| 44 |
+
|
| 45 |
+
Results from Table 1 of the KELE paper (test set: 680 dialogues, 4,245 single-turn examples):
|
| 46 |
+
|
| 47 |
+
| Model | ROUGE-1 | ROUGE-2 | BLEU-4 | PRR | NDAR | SPR | IAR | Guidance | Logicality | Flexibility |
|
| 48 |
+
|---|---|---|---|---|---|---|---|---|---|---|
|
| 49 |
+
| GPT-4o | 38.25 | 22.35 | 29.93 | 72.13 | 81.19 | 85.00 | 87.74 | 4.35 | 4.50 | 4.33 |
|
| 50 |
+
| Qwen2.5-7B | 40.95 | 15.27 | 24.96 | 59.02 | 80.52 | 60.00 | 76.45 | 3.87 | 3.96 | 3.87 |
|
| 51 |
+
| Qwen2.5-14B | 43.79 | 17.06 | 26.63 | 65.21 | 78.57 | 74.00 | 80.81 | 3.99 | 4.15 | 4.03 |
|
| 52 |
+
| Qwen2.5-32B | 46.22 | 19.90 | 28.85 | 65.57 | 83.13 | 81.00 | 84.68 | 4.12 | 4.44 | 4.21 |
|
| 53 |
+
| EduChat-13B | 34.75 | 9.91 | 21.11 | 47.62 | 90.73 | 51.00 | 69.02 | 2.93 | 3.42 | 3.18 |
|
| 54 |
+
| SocraticLM-7B | 18.63 | 5.56 | 10.93 | 26.83 | 30.26 | 36.00 | 27.05 | 2.62 | 2.88 | 2.78 |
|
| 55 |
+
| **SocratTeachLLM (this model)** | **57.40** | **33.63** | **41.96** | **75.13** | **94.71** | **87.00** | **89.03** | **4.66** | **4.53** | **4.45** |
|
| 56 |
+
|
| 57 |
+
**Metric definitions:**
|
| 58 |
+
- **PRR** — Problem Relevance Rate: teacher question relates directly to the problem
|
| 59 |
+
- **NDAR** — No Direct Answer Rate: teacher avoids giving away the answer
|
| 60 |
+
- **SPR** — Summary Pass Rate: correct and complete final summary
|
| 61 |
+
- **IAR** — Instruction Adherence Rate: teacher follows the consultant's recommended strategy
|
| 62 |
+
- **Guidance / Logicality / Flexibility** — GPT-4o judge scores on a 1–5 scale (B.5 rubric)
|
| 63 |
+
|
| 64 |
+
SocratTeachLLM outperforms GPT-4o on every metric despite being ~40× smaller.
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## Training Details
|
| 69 |
+
|
| 70 |
+
| Setting | Value |
|
| 71 |
+
|---|---|
|
| 72 |
+
| Base model | GLM4-9B-Chat |
|
| 73 |
+
| Method | LoRA |
|
| 74 |
+
| Epochs | 3 |
|
| 75 |
+
| Learning rate | 5e-5 |
|
| 76 |
+
| Batch size | 16 |
|
| 77 |
+
| Train split | 6,123 dialogues (90%) |
|
| 78 |
+
| Test split | 680 dialogues (10%) |
|
| 79 |
+
| Hardware | 2× NVIDIA A800 80GB |
|
| 80 |
+
| Dataset | SocratDataset (6,803 records, Chinese) |
|
| 81 |
+
|
| 82 |
+
### Training Objective
|
| 83 |
+
|
| 84 |
+
```
|
| 85 |
+
P(teacher_response | dialogue_history, evaluation, action)
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
The `evaluation` (consultant's stage/state assessment) and `action` (recommended strategy) fields are required conditioning signals. At inference time, a consultant agent produces these before the teacher agent generates its response. Without the consultant outputs as conditioning, the model will underperform.
|
| 89 |
|
| 90 |
---
|
| 91 |
|
| 92 |
## Model Architecture
|
| 93 |
|
| 94 |
| Parameter | Value |
|
| 95 |
+
|---|---|
|
| 96 |
| Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
|
| 97 |
+
| Total parameters | ~9.4B |
|
| 98 |
| Layers | 40 |
|
| 99 |
| Hidden size | 4,096 |
|
| 100 |
| Attention heads | 32 |
|
|
|
|
| 104 |
| Max context length | 131,072 tokens (128K) |
|
| 105 |
| Storage dtype | bfloat16 |
|
| 106 |
| Attention | Multi-query (2 groups), RoPE (ratio 500) |
|
| 107 |
+
| Normalization | RMSNorm |
|
| 108 |
+
| Weight files | 4× safetensors shards (~18.8 GB total) |
|
|
|
|
| 109 |
|
| 110 |
+
**Generation defaults:** temperature 0.8, top-p 0.8.
|
| 111 |
|
| 112 |
---
|
| 113 |
|
| 114 |
+
## Usage
|
| 115 |
+
|
| 116 |
+
### Transformers (recommended, ~19 GB VRAM)
|
| 117 |
|
| 118 |
The model uses custom modeling code, so `trust_remote_code=True` is required.
|
| 119 |
|
|
|
|
| 131 |
trust_remote_code=True,
|
| 132 |
)
|
| 133 |
|
| 134 |
+
messages = [{"role": "user", "content": "What do you think causes the seasons to change?"}]
|
|
|
|
|
|
|
|
|
|
| 135 |
inputs = tokenizer.apply_chat_template(
|
| 136 |
messages, add_generation_prompt=True, return_tensors="pt"
|
| 137 |
).to(model.device)
|
|
|
|
| 140 |
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
|
| 141 |
```
|
| 142 |
|
| 143 |
+
### 4-bit NF4 via bitsandbytes (~6.5 GB VRAM)
|
| 144 |
|
| 145 |
```python
|
| 146 |
from transformers import BitsAndBytesConfig
|
|
|
|
| 151 |
bnb_4bit_use_double_quant=True,
|
| 152 |
bnb_4bit_quant_type="nf4",
|
| 153 |
)
|
|
|
|
| 154 |
model = AutoModelForCausalLM.from_pretrained(
|
| 155 |
model_id,
|
| 156 |
quantization_config=bnb_config,
|
|
|
|
| 159 |
)
|
| 160 |
```
|
| 161 |
|
| 162 |
+
### vLLM (OpenAI-compatible endpoint)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
```bash
|
| 165 |
+
vllm serve ulises-c/SocratTeachLLM \
|
| 166 |
+
--served-model-name SocratTeachLLM \
|
| 167 |
+
--dtype bfloat16 \
|
| 168 |
+
--trust-remote-code
|
|
|
|
| 169 |
```
|
| 170 |
|
| 171 |
+
### Ollama
|
| 172 |
|
| 173 |
+
This repo includes a `Modelfile` (auto-generated by LlamaFactory) with the correct ChatGLM4 stop sequences and a 4,096-token context window.
|
|
|
|
|
|
|
| 174 |
|
| 175 |
```bash
|
| 176 |
+
ollama create SocratTeachLLM -f Modelfile
|
| 177 |
+
ollama run SocratTeachLLM
|
|
|
|
|
|
|
| 178 |
```
|
| 179 |
|
| 180 |
+
> **Note:** Ollama caps context at 4,096 tokens. For the full 128K context, use Transformers or vLLM.
|
| 181 |
|
| 182 |
---
|
| 183 |
|
| 184 |
## Built With This Model
|
| 185 |
|
| 186 |
+
**[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using this model as the teacher agent.
|
| 187 |
|
| 188 |
Key integration details:
|
| 189 |
+
- **Teacher:** SocratTeachLLM, served via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster L40S)
|
| 190 |
+
- **Consultant:** GPT-4o (baseline) or Qwen3.5-9B (local variant)
|
| 191 |
+
- **Evaluation:** 680-dialogue test split of SocratDataset, automated with ROUGE, BLEU, and GPT-4o judge (B.5 rubric)
|
| 192 |
+
- **English extension:** An English translation of the training dataset is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN)
|
|
|
|
| 193 |
|
| 194 |
```bash
|
|
|
|
| 195 |
hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
|
| 196 |
```
|
| 197 |
|
|
|
|
| 200 |
## Training Data
|
| 201 |
|
| 202 |
| Property | Value |
|
| 203 |
+
|---|---|
|
| 204 |
+
| Dataset | [ulises-c/SocratDataset](https://huggingface.co/datasets/ulises-c/SocratDataset) |
|
| 205 |
| Dialogues | 6,803 |
|
| 206 |
| Turns | 42,000+ |
|
| 207 |
+
| Domain | Elementary school science (grades 1–6) |
|
| 208 |
+
| Language | Chinese (Simplified) |
|
| 209 |
+
| Train split | 6,123 dialogues (90%) |
|
| 210 |
| Test split | 680 dialogues (10%) |
|
| 211 |
| Strategies | 34 SocRule teaching strategies |
|
| 212 |
|
| 213 |
+
An English translation of the training data is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN).
|
| 214 |
+
|
| 215 |
---
|
| 216 |
|
| 217 |
## Citation
|
|
|
|
| 219 |
If you use this model, please cite the original KELE paper:
|
| 220 |
|
| 221 |
```bibtex
|
| 222 |
+
@inproceedings{peng-etal-2025-kele,
|
| 223 |
+
title = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
|
| 224 |
author = {Peng, Yuan and others},
|
| 225 |
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
|
| 226 |
year = {2025},
|
| 227 |
+
url = {https://aclanthology.org/2025.findings-emnlp.888/}
|
| 228 |
}
|
| 229 |
```
|
| 230 |
|
| 231 |
---
|
| 232 |
|
| 233 |
+
## Related Resources
|
| 234 |
+
|
| 235 |
+
| Resource | Link |
|
| 236 |
+
|---|---|
|
| 237 |
+
| KELE paper (EMNLP 2025 Findings) | https://aclanthology.org/2025.findings-emnlp.888/ |
|
| 238 |
+
| KELE GitHub repository | https://github.com/yuanpan1020/KELE |
|
| 239 |
+
| Original model | https://huggingface.co/yuanpan/SocratTeachLLM |
|
| 240 |
+
| Training data (Chinese) | https://huggingface.co/datasets/ulises-c/SocratDataset |
|
| 241 |
+
| Training data (English translation) | https://huggingface.co/datasets/ulises-c/SocratDataset-EN |
|
| 242 |
+
| Evaluation + inference code | https://github.com/ulises-c/csen-346 |
|
| 243 |
+
|
| 244 |
+
---
|
| 245 |
+
|
| 246 |
## License
|
| 247 |
|
| 248 |
[Apache 2.0](LICENSE)
|