File size: 8,861 Bytes
0ead800
 
 
ddf2ed9
 
 
 
 
 
 
 
 
 
 
2375c4c
 
 
 
ddf2ed9
2375c4c
ddf2ed9
2375c4c
 
 
 
 
ddf2ed9
2375c4c
ddf2ed9
2375c4c
ddf2ed9
 
 
 
 
 
 
2375c4c
ddf2ed9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2375c4c
 
 
 
 
 
ddf2ed9
2375c4c
ddf2ed9
2375c4c
 
 
 
 
 
 
 
 
ddf2ed9
 
2375c4c
ddf2ed9
2375c4c
 
 
ddf2ed9
 
 
2375c4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ddf2ed9
2375c4c
 
 
 
 
 
 
 
ddf2ed9
2375c4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ddf2ed9
2375c4c
 
ddf2ed9
 
 
 
2375c4c
 
ddf2ed9
2375c4c
ddf2ed9
2375c4c
 
ddf2ed9
 
2375c4c
 
ddf2ed9
2375c4c
 
 
 
 
ddf2ed9
2375c4c
 
ddf2ed9
 
 
 
2375c4c
 
 
 
 
 
 
 
 
 
ddf2ed9
 
2375c4c
 
ddf2ed9
 
 
2375c4c
 
 
ddf2ed9
 
2375c4c
 
 
 
 
 
 
ddf2ed9
 
2375c4c
 
 
ddf2ed9
2375c4c
 
 
 
 
ddf2ed9
 
 
 
 
 
 
 
 
 
 
 
 
2375c4c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
---
license: apache-2.0
language:
  - zh
  - en
tags:
  - education
  - socratic-teaching
  - dialogue
  - fine-tuned
  - glm4
  - kele
  - lora
base_model: THUDM/glm-4-9b-chat
---

# SocratTeachLLM

A LoRA fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.888) (Peng et al., EMNLP 2025 Findings).

> **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM) — this repository is a copy with an expanded README.

---

## What It Does

SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate **consultant agent** (e.g., GPT-4o or Qwen) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.

Teaching proceeds through five stages (SocRule):

| Stage | Name | State codes | Description |
|---|---|---|---|
| a | Initiation | a1 | Student poses the question; dialogue begins |
| b | Concept Probing | b2–b7 | Teacher probes prior knowledge and surfaces misconceptions |
| c | Inductive Reasoning | c8–c29 | Core teaching stage — guides the student toward generalizations; can repeat many turns |
| d | Answer Derivation | d30–d33 | Help the student arrive at the correct answer |
| e | Summary | e34 | Consolidate and reinforce learning |

The model was fine-tuned on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics in Chinese.

---

## Published Performance

Results from Table 1 of the KELE paper (test set: 680 dialogues, 4,245 single-turn examples):

| Model | ROUGE-1 | ROUGE-2 | BLEU-4 | PRR | NDAR | SPR | IAR | Guidance | Logicality | Flexibility |
|---|---|---|---|---|---|---|---|---|---|---|
| GPT-4o | 38.25 | 22.35 | 29.93 | 72.13 | 81.19 | 85.00 | 87.74 | 4.35 | 4.50 | 4.33 |
| Qwen2.5-7B | 40.95 | 15.27 | 24.96 | 59.02 | 80.52 | 60.00 | 76.45 | 3.87 | 3.96 | 3.87 |
| Qwen2.5-14B | 43.79 | 17.06 | 26.63 | 65.21 | 78.57 | 74.00 | 80.81 | 3.99 | 4.15 | 4.03 |
| Qwen2.5-32B | 46.22 | 19.90 | 28.85 | 65.57 | 83.13 | 81.00 | 84.68 | 4.12 | 4.44 | 4.21 |
| EduChat-13B | 34.75 | 9.91 | 21.11 | 47.62 | 90.73 | 51.00 | 69.02 | 2.93 | 3.42 | 3.18 |
| SocraticLM-7B | 18.63 | 5.56 | 10.93 | 26.83 | 30.26 | 36.00 | 27.05 | 2.62 | 2.88 | 2.78 |
| **SocratTeachLLM (this model)** | **57.40** | **33.63** | **41.96** | **75.13** | **94.71** | **87.00** | **89.03** | **4.66** | **4.53** | **4.45** |

**Metric definitions:**
- **PRR** — Problem Relevance Rate: teacher question relates directly to the problem
- **NDAR** — No Direct Answer Rate: teacher avoids giving away the answer
- **SPR** — Summary Pass Rate: correct and complete final summary
- **IAR** — Instruction Adherence Rate: teacher follows the consultant's recommended strategy
- **Guidance / Logicality / Flexibility** — GPT-4o judge scores on a 1–5 scale (B.5 rubric)

SocratTeachLLM outperforms GPT-4o on every metric despite being ~40× smaller.

---

## Training Details

| Setting | Value |
|---|---|
| Base model | GLM4-9B-Chat |
| Method | LoRA |
| Epochs | 3 |
| Learning rate | 5e-5 |
| Batch size | 16 |
| Train split | 6,123 dialogues (90%) |
| Test split | 680 dialogues (10%) |
| Hardware | 2× NVIDIA A800 80GB |
| Dataset | SocratDataset (6,803 records, Chinese) |

### Training Objective

```
P(teacher_response | dialogue_history, evaluation, action)
```

The `evaluation` (consultant's stage/state assessment) and `action` (recommended strategy) fields are required conditioning signals. At inference time, a consultant agent produces these before the teacher agent generates its response. Without the consultant outputs as conditioning, the model will underperform.

---

## Model Architecture

| Parameter | Value |
|---|---|
| Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
| Total parameters | ~9.4B |
| Layers | 40 |
| Hidden size | 4,096 |
| Attention heads | 32 |
| FFN hidden size | 13,696 |
| KV channels | 128 |
| Vocabulary size | 151,552 |
| Max context length | 131,072 tokens (128K) |
| Storage dtype | bfloat16 |
| Attention | Multi-query (2 groups), RoPE (ratio 500) |
| Normalization | RMSNorm |
| Weight files | 4× safetensors shards (~18.8 GB total) |

**Generation defaults:** temperature 0.8, top-p 0.8.

---

## Usage

### Transformers (recommended, ~19 GB VRAM)

The model uses custom modeling code, so `trust_remote_code=True` is required.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ulises-c/SocratTeachLLM"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "What do you think causes the seasons to change?"}]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.8)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```

### 4-bit NF4 via bitsandbytes (~6.5 GB VRAM)

```python
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
```

### vLLM (OpenAI-compatible endpoint)

```bash
vllm serve ulises-c/SocratTeachLLM \
  --served-model-name SocratTeachLLM \
  --dtype bfloat16 \
  --trust-remote-code
```

### Ollama

This repo includes a `Modelfile` (auto-generated by LlamaFactory) with the correct ChatGLM4 stop sequences and a 4,096-token context window.

```bash
ollama create SocratTeachLLM -f Modelfile
ollama run SocratTeachLLM
```

> **Note:** Ollama caps context at 4,096 tokens. For the full 128K context, use Transformers or vLLM.

---

## Built With This Model

**[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using this model as the teacher agent.

Key integration details:
- **Teacher:** SocratTeachLLM, served via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster L40S)
- **Consultant:** GPT-4o (baseline) or Qwen3.5-9B (local variant)
- **Evaluation:** 680-dialogue test split of SocratDataset, automated with ROUGE, BLEU, and GPT-4o judge (B.5 rubric)
- **English extension:** An English translation of the training dataset is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN)

```bash
hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
```

---

## Training Data

| Property | Value |
|---|---|
| Dataset | [ulises-c/SocratDataset](https://huggingface.co/datasets/ulises-c/SocratDataset) |
| Dialogues | 6,803 |
| Turns | 42,000+ |
| Domain | Elementary school science (grades 1–6) |
| Language | Chinese (Simplified) |
| Train split | 6,123 dialogues (90%) |
| Test split | 680 dialogues (10%) |
| Strategies | 34 SocRule teaching strategies |

An English translation of the training data is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN).

---

## Citation

If you use this model, please cite the original KELE paper:

```bibtex
@inproceedings{peng-etal-2025-kele,
  title     = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
  author    = {Peng, Yuan and others},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
  year      = {2025},
  url       = {https://aclanthology.org/2025.findings-emnlp.888/}
}
```

---

## Related Resources

| Resource | Link |
|---|---|
| KELE paper (EMNLP 2025 Findings) | https://aclanthology.org/2025.findings-emnlp.888/ |
| KELE GitHub repository | https://github.com/yuanpan1020/KELE |
| Original model | https://huggingface.co/yuanpan/SocratTeachLLM |
| Training data (Chinese) | https://huggingface.co/datasets/ulises-c/SocratDataset |
| Training data (English translation) | https://huggingface.co/datasets/ulises-c/SocratDataset-EN |
| Evaluation + inference code | https://github.com/ulises-c/csen-346 |

---

## License

[Apache 2.0](LICENSE)