File size: 8,861 Bytes
0ead800 ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c ddf2ed9 2375c4c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | ---
license: apache-2.0
language:
- zh
- en
tags:
- education
- socratic-teaching
- dialogue
- fine-tuned
- glm4
- kele
- lora
base_model: THUDM/glm-4-9b-chat
---
# SocratTeachLLM
A LoRA fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.888) (Peng et al., EMNLP 2025 Findings).
> **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM) — this repository is a copy with an expanded README.
---
## What It Does
SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate **consultant agent** (e.g., GPT-4o or Qwen) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
Teaching proceeds through five stages (SocRule):
| Stage | Name | State codes | Description |
|---|---|---|---|
| a | Initiation | a1 | Student poses the question; dialogue begins |
| b | Concept Probing | b2–b7 | Teacher probes prior knowledge and surfaces misconceptions |
| c | Inductive Reasoning | c8–c29 | Core teaching stage — guides the student toward generalizations; can repeat many turns |
| d | Answer Derivation | d30–d33 | Help the student arrive at the correct answer |
| e | Summary | e34 | Consolidate and reinforce learning |
The model was fine-tuned on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics in Chinese.
---
## Published Performance
Results from Table 1 of the KELE paper (test set: 680 dialogues, 4,245 single-turn examples):
| Model | ROUGE-1 | ROUGE-2 | BLEU-4 | PRR | NDAR | SPR | IAR | Guidance | Logicality | Flexibility |
|---|---|---|---|---|---|---|---|---|---|---|
| GPT-4o | 38.25 | 22.35 | 29.93 | 72.13 | 81.19 | 85.00 | 87.74 | 4.35 | 4.50 | 4.33 |
| Qwen2.5-7B | 40.95 | 15.27 | 24.96 | 59.02 | 80.52 | 60.00 | 76.45 | 3.87 | 3.96 | 3.87 |
| Qwen2.5-14B | 43.79 | 17.06 | 26.63 | 65.21 | 78.57 | 74.00 | 80.81 | 3.99 | 4.15 | 4.03 |
| Qwen2.5-32B | 46.22 | 19.90 | 28.85 | 65.57 | 83.13 | 81.00 | 84.68 | 4.12 | 4.44 | 4.21 |
| EduChat-13B | 34.75 | 9.91 | 21.11 | 47.62 | 90.73 | 51.00 | 69.02 | 2.93 | 3.42 | 3.18 |
| SocraticLM-7B | 18.63 | 5.56 | 10.93 | 26.83 | 30.26 | 36.00 | 27.05 | 2.62 | 2.88 | 2.78 |
| **SocratTeachLLM (this model)** | **57.40** | **33.63** | **41.96** | **75.13** | **94.71** | **87.00** | **89.03** | **4.66** | **4.53** | **4.45** |
**Metric definitions:**
- **PRR** — Problem Relevance Rate: teacher question relates directly to the problem
- **NDAR** — No Direct Answer Rate: teacher avoids giving away the answer
- **SPR** — Summary Pass Rate: correct and complete final summary
- **IAR** — Instruction Adherence Rate: teacher follows the consultant's recommended strategy
- **Guidance / Logicality / Flexibility** — GPT-4o judge scores on a 1–5 scale (B.5 rubric)
SocratTeachLLM outperforms GPT-4o on every metric despite being ~40× smaller.
---
## Training Details
| Setting | Value |
|---|---|
| Base model | GLM4-9B-Chat |
| Method | LoRA |
| Epochs | 3 |
| Learning rate | 5e-5 |
| Batch size | 16 |
| Train split | 6,123 dialogues (90%) |
| Test split | 680 dialogues (10%) |
| Hardware | 2× NVIDIA A800 80GB |
| Dataset | SocratDataset (6,803 records, Chinese) |
### Training Objective
```
P(teacher_response | dialogue_history, evaluation, action)
```
The `evaluation` (consultant's stage/state assessment) and `action` (recommended strategy) fields are required conditioning signals. At inference time, a consultant agent produces these before the teacher agent generates its response. Without the consultant outputs as conditioning, the model will underperform.
---
## Model Architecture
| Parameter | Value |
|---|---|
| Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
| Total parameters | ~9.4B |
| Layers | 40 |
| Hidden size | 4,096 |
| Attention heads | 32 |
| FFN hidden size | 13,696 |
| KV channels | 128 |
| Vocabulary size | 151,552 |
| Max context length | 131,072 tokens (128K) |
| Storage dtype | bfloat16 |
| Attention | Multi-query (2 groups), RoPE (ratio 500) |
| Normalization | RMSNorm |
| Weight files | 4× safetensors shards (~18.8 GB total) |
**Generation defaults:** temperature 0.8, top-p 0.8.
---
## Usage
### Transformers (recommended, ~19 GB VRAM)
The model uses custom modeling code, so `trust_remote_code=True` is required.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "ulises-c/SocratTeachLLM"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "What do you think causes the seasons to change?"}]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.8)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```
### 4-bit NF4 via bitsandbytes (~6.5 GB VRAM)
```python
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
```
### vLLM (OpenAI-compatible endpoint)
```bash
vllm serve ulises-c/SocratTeachLLM \
--served-model-name SocratTeachLLM \
--dtype bfloat16 \
--trust-remote-code
```
### Ollama
This repo includes a `Modelfile` (auto-generated by LlamaFactory) with the correct ChatGLM4 stop sequences and a 4,096-token context window.
```bash
ollama create SocratTeachLLM -f Modelfile
ollama run SocratTeachLLM
```
> **Note:** Ollama caps context at 4,096 tokens. For the full 128K context, use Transformers or vLLM.
---
## Built With This Model
**[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using this model as the teacher agent.
Key integration details:
- **Teacher:** SocratTeachLLM, served via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster L40S)
- **Consultant:** GPT-4o (baseline) or Qwen3.5-9B (local variant)
- **Evaluation:** 680-dialogue test split of SocratDataset, automated with ROUGE, BLEU, and GPT-4o judge (B.5 rubric)
- **English extension:** An English translation of the training dataset is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN)
```bash
hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
```
---
## Training Data
| Property | Value |
|---|---|
| Dataset | [ulises-c/SocratDataset](https://huggingface.co/datasets/ulises-c/SocratDataset) |
| Dialogues | 6,803 |
| Turns | 42,000+ |
| Domain | Elementary school science (grades 1–6) |
| Language | Chinese (Simplified) |
| Train split | 6,123 dialogues (90%) |
| Test split | 680 dialogues (10%) |
| Strategies | 34 SocRule teaching strategies |
An English translation of the training data is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN).
---
## Citation
If you use this model, please cite the original KELE paper:
```bibtex
@inproceedings{peng-etal-2025-kele,
title = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
author = {Peng, Yuan and others},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
year = {2025},
url = {https://aclanthology.org/2025.findings-emnlp.888/}
}
```
---
## Related Resources
| Resource | Link |
|---|---|
| KELE paper (EMNLP 2025 Findings) | https://aclanthology.org/2025.findings-emnlp.888/ |
| KELE GitHub repository | https://github.com/yuanpan1020/KELE |
| Original model | https://huggingface.co/yuanpan/SocratTeachLLM |
| Training data (Chinese) | https://huggingface.co/datasets/ulises-c/SocratDataset |
| Training data (English translation) | https://huggingface.co/datasets/ulises-c/SocratDataset-EN |
| Evaluation + inference code | https://github.com/ulises-c/csen-346 |
---
## License
[Apache 2.0](LICENSE)
|