ulises-c commited on
Commit
ddf2ed9
·
verified ·
1 Parent(s): 2375c4c

Add detailed README with benchmarks, architecture, and dataset cross-links

Browse files
Files changed (1) hide show
  1. README.md +120 -58
README.md CHANGED
@@ -1,40 +1,100 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - zh
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  # SocratTeachLLM
8
 
9
- A fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.XXX) (Peng et al., EMNLP 2025 Findings).
10
 
11
- > **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM)
12
 
13
  ---
14
 
15
  ## What It Does
16
 
17
- SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate consultant agent (e.g., GPT-4o) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
18
 
19
- Teaching proceeds through five stages:
20
 
21
- | Stage | Name | Description |
22
- |-------|------|-------------|
23
- | A | Student Questioning | Elicit prior knowledge and surface misconceptions |
24
- | B | Concept Probing | Probe understanding of core concepts |
25
- | C | Inductive Reasoning | Guide the student toward generalizations |
26
- | D | Rule Construction | Help the student articulate a principle or rule |
27
- | E | Summary | Consolidate and reinforce learning |
28
 
29
- The model was fine-tuned (LoRA) on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics, primarily in Chinese.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ---
32
 
33
  ## Model Architecture
34
 
35
  | Parameter | Value |
36
- |-----------|-------|
37
  | Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
 
38
  | Layers | 40 |
39
  | Hidden size | 4,096 |
40
  | Attention heads | 32 |
@@ -44,15 +104,16 @@ The model was fine-tuned (LoRA) on **SocratDataset**: 6,803 multi-turn Socratic
44
  | Max context length | 131,072 tokens (128K) |
45
  | Storage dtype | bfloat16 |
46
  | Attention | Multi-query (2 groups), RoPE (ratio 500) |
47
- | Normalization | RMSNorm, post-layer-norm |
48
- | Total parameters | ~9.4B |
49
- | Weight files | 4 × safetensors shards (~18.8 GB total) |
50
 
51
- **Generation defaults:** temperature 0.8, top-p 0.8, max length 128K.
52
 
53
  ---
54
 
55
- ## Loading with Transformers
 
 
56
 
57
  The model uses custom modeling code, so `trust_remote_code=True` is required.
58
 
@@ -70,10 +131,7 @@ model = AutoModelForCausalLM.from_pretrained(
70
  trust_remote_code=True,
71
  )
72
 
73
- messages = [
74
- {"role": "user", "content": "What do you think causes the seasons to change?"}
75
- ]
76
-
77
  inputs = tokenizer.apply_chat_template(
78
  messages, add_generation_prompt=True, return_tensors="pt"
79
  ).to(model.device)
@@ -82,7 +140,7 @@ outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.8)
82
  print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
83
  ```
84
 
85
- ### Low-VRAM (4-bit NF4 via bitsandbytes, ~6.5 GB)
86
 
87
  ```python
88
  from transformers import BitsAndBytesConfig
@@ -93,7 +151,6 @@ bnb_config = BitsAndBytesConfig(
93
  bnb_4bit_use_double_quant=True,
94
  bnb_4bit_quant_type="nf4",
95
  )
96
-
97
  model = AutoModelForCausalLM.from_pretrained(
98
  model_id,
99
  quantization_config=bnb_config,
@@ -102,50 +159,39 @@ model = AutoModelForCausalLM.from_pretrained(
102
  )
103
  ```
104
 
105
- ---
106
-
107
- ## Running Locally with Ollama
108
-
109
- This repo includes a `Modelfile` for Ollama (auto-generated by LlamaFactory). It sets a 4,096-token context window and the correct stop sequences for the ChatGLM4 chat format.
110
 
111
  ```bash
112
- # Create the Ollama model from the local Modelfile
113
- ollama create SocratTeachLLM -f Modelfile
114
-
115
- # Run interactively
116
- ollama run SocratTeachLLM
117
  ```
118
 
119
- Stop sequences used: `<|user|>`, `<|endoftext|>`, `<|observation|>`.
120
 
121
- > **Note:** Ollama currently caps the context at 4,096 tokens. For the full 128K context, use the Transformers or vLLM path.
122
-
123
- ### vLLM (full bfloat16, ~19 GB VRAM)
124
 
125
  ```bash
126
- vllm serve /path/to/SocratTeachLLM \
127
- --served-model-name SocratTeachLLM \
128
- --dtype bfloat16 \
129
- --trust-remote-code
130
  ```
131
 
132
- This exposes an OpenAI-compatible endpoint at `http://localhost:8000/v1`.
133
 
134
  ---
135
 
136
  ## Built With This Model
137
 
138
- **[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using SocratTeachLLM as the teacher agent.
139
 
140
  Key integration details:
141
-
142
- - **Teacher agent:** SocratTeachLLM, served locally via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster)
143
- - **Consultant agent:** GPT-4o (baseline) or Qwen3.5-9B (local variant) selects Socratic strategies from SocRule and passes them to the teacher
144
- - **Evaluation:** 680-dialogue test split of SocratDataset
145
- - **API surface:** OpenAI-compatible chat completions endpoint (`TEACHER_MODEL_NAME=SocratTeachLLM`)
146
 
147
  ```bash
148
- # Download the model for use in csen-346
149
  hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
150
  ```
151
 
@@ -154,16 +200,18 @@ hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
154
  ## Training Data
155
 
156
  | Property | Value |
157
- |----------|-------|
158
- | Dataset | SocratDataset |
159
  | Dialogues | 6,803 |
160
  | Turns | 42,000+ |
161
- | Domain | Elementary school science |
162
- | Language | Primarily Chinese |
163
- | Train split | 5,723 dialogues (90%) |
164
  | Test split | 680 dialogues (10%) |
165
  | Strategies | 34 SocRule teaching strategies |
166
 
 
 
167
  ---
168
 
169
  ## Citation
@@ -171,16 +219,30 @@ hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
171
  If you use this model, please cite the original KELE paper:
172
 
173
  ```bibtex
174
- @inproceedings{peng2025kele,
175
- title = {KELE: A Multi-Agent Framework for Structured Socratic Teaching with Large Language Models},
176
  author = {Peng, Yuan and others},
177
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
178
  year = {2025},
 
179
  }
180
  ```
181
 
182
  ---
183
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  ## License
185
 
186
  [Apache 2.0](LICENSE)
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - zh
5
+ - en
6
+ tags:
7
+ - education
8
+ - socratic-teaching
9
+ - dialogue
10
+ - fine-tuned
11
+ - glm4
12
+ - kele
13
+ - lora
14
+ base_model: THUDM/glm-4-9b-chat
15
  ---
16
 
17
  # SocratTeachLLM
18
 
19
+ A LoRA fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.888) (Peng et al., EMNLP 2025 Findings).
20
 
21
+ > **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM) — this repository is a copy with an expanded README.
22
 
23
  ---
24
 
25
  ## What It Does
26
 
27
+ SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate **consultant agent** (e.g., GPT-4o or Qwen) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
28
 
29
+ Teaching proceeds through five stages (SocRule):
30
 
31
+ | Stage | Name | State codes | Description |
32
+ |---|---|---|---|
33
+ | a | Initiation | a1 | Student poses the question; dialogue begins |
34
+ | b | Concept Probing | b2–b7 | Teacher probes prior knowledge and surfaces misconceptions |
35
+ | c | Inductive Reasoning | c8–c29 | Core teaching stage — guides the student toward generalizations; can repeat many turns |
36
+ | d | Answer Derivation | d30–d33 | Help the student arrive at the correct answer |
37
+ | e | Summary | e34 | Consolidate and reinforce learning |
38
 
39
+ The model was fine-tuned on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics in Chinese.
40
+
41
+ ---
42
+
43
+ ## Published Performance
44
+
45
+ Results from Table 1 of the KELE paper (test set: 680 dialogues, 4,245 single-turn examples):
46
+
47
+ | Model | ROUGE-1 | ROUGE-2 | BLEU-4 | PRR | NDAR | SPR | IAR | Guidance | Logicality | Flexibility |
48
+ |---|---|---|---|---|---|---|---|---|---|---|
49
+ | GPT-4o | 38.25 | 22.35 | 29.93 | 72.13 | 81.19 | 85.00 | 87.74 | 4.35 | 4.50 | 4.33 |
50
+ | Qwen2.5-7B | 40.95 | 15.27 | 24.96 | 59.02 | 80.52 | 60.00 | 76.45 | 3.87 | 3.96 | 3.87 |
51
+ | Qwen2.5-14B | 43.79 | 17.06 | 26.63 | 65.21 | 78.57 | 74.00 | 80.81 | 3.99 | 4.15 | 4.03 |
52
+ | Qwen2.5-32B | 46.22 | 19.90 | 28.85 | 65.57 | 83.13 | 81.00 | 84.68 | 4.12 | 4.44 | 4.21 |
53
+ | EduChat-13B | 34.75 | 9.91 | 21.11 | 47.62 | 90.73 | 51.00 | 69.02 | 2.93 | 3.42 | 3.18 |
54
+ | SocraticLM-7B | 18.63 | 5.56 | 10.93 | 26.83 | 30.26 | 36.00 | 27.05 | 2.62 | 2.88 | 2.78 |
55
+ | **SocratTeachLLM (this model)** | **57.40** | **33.63** | **41.96** | **75.13** | **94.71** | **87.00** | **89.03** | **4.66** | **4.53** | **4.45** |
56
+
57
+ **Metric definitions:**
58
+ - **PRR** — Problem Relevance Rate: teacher question relates directly to the problem
59
+ - **NDAR** — No Direct Answer Rate: teacher avoids giving away the answer
60
+ - **SPR** — Summary Pass Rate: correct and complete final summary
61
+ - **IAR** — Instruction Adherence Rate: teacher follows the consultant's recommended strategy
62
+ - **Guidance / Logicality / Flexibility** — GPT-4o judge scores on a 1–5 scale (B.5 rubric)
63
+
64
+ SocratTeachLLM outperforms GPT-4o on every metric despite being ~40× smaller.
65
+
66
+ ---
67
+
68
+ ## Training Details
69
+
70
+ | Setting | Value |
71
+ |---|---|
72
+ | Base model | GLM4-9B-Chat |
73
+ | Method | LoRA |
74
+ | Epochs | 3 |
75
+ | Learning rate | 5e-5 |
76
+ | Batch size | 16 |
77
+ | Train split | 6,123 dialogues (90%) |
78
+ | Test split | 680 dialogues (10%) |
79
+ | Hardware | 2× NVIDIA A800 80GB |
80
+ | Dataset | SocratDataset (6,803 records, Chinese) |
81
+
82
+ ### Training Objective
83
+
84
+ ```
85
+ P(teacher_response | dialogue_history, evaluation, action)
86
+ ```
87
+
88
+ The `evaluation` (consultant's stage/state assessment) and `action` (recommended strategy) fields are required conditioning signals. At inference time, a consultant agent produces these before the teacher agent generates its response. Without the consultant outputs as conditioning, the model will underperform.
89
 
90
  ---
91
 
92
  ## Model Architecture
93
 
94
  | Parameter | Value |
95
+ |---|---|
96
  | Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
97
+ | Total parameters | ~9.4B |
98
  | Layers | 40 |
99
  | Hidden size | 4,096 |
100
  | Attention heads | 32 |
 
104
  | Max context length | 131,072 tokens (128K) |
105
  | Storage dtype | bfloat16 |
106
  | Attention | Multi-query (2 groups), RoPE (ratio 500) |
107
+ | Normalization | RMSNorm |
108
+ | Weight files | 4× safetensors shards (~18.8 GB total) |
 
109
 
110
+ **Generation defaults:** temperature 0.8, top-p 0.8.
111
 
112
  ---
113
 
114
+ ## Usage
115
+
116
+ ### Transformers (recommended, ~19 GB VRAM)
117
 
118
  The model uses custom modeling code, so `trust_remote_code=True` is required.
119
 
 
131
  trust_remote_code=True,
132
  )
133
 
134
+ messages = [{"role": "user", "content": "What do you think causes the seasons to change?"}]
 
 
 
135
  inputs = tokenizer.apply_chat_template(
136
  messages, add_generation_prompt=True, return_tensors="pt"
137
  ).to(model.device)
 
140
  print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
141
  ```
142
 
143
+ ### 4-bit NF4 via bitsandbytes (~6.5 GB VRAM)
144
 
145
  ```python
146
  from transformers import BitsAndBytesConfig
 
151
  bnb_4bit_use_double_quant=True,
152
  bnb_4bit_quant_type="nf4",
153
  )
 
154
  model = AutoModelForCausalLM.from_pretrained(
155
  model_id,
156
  quantization_config=bnb_config,
 
159
  )
160
  ```
161
 
162
+ ### vLLM (OpenAI-compatible endpoint)
 
 
 
 
163
 
164
  ```bash
165
+ vllm serve ulises-c/SocratTeachLLM \
166
+ --served-model-name SocratTeachLLM \
167
+ --dtype bfloat16 \
168
+ --trust-remote-code
 
169
  ```
170
 
171
+ ### Ollama
172
 
173
+ This repo includes a `Modelfile` (auto-generated by LlamaFactory) with the correct ChatGLM4 stop sequences and a 4,096-token context window.
 
 
174
 
175
  ```bash
176
+ ollama create SocratTeachLLM -f Modelfile
177
+ ollama run SocratTeachLLM
 
 
178
  ```
179
 
180
+ > **Note:** Ollama caps context at 4,096 tokens. For the full 128K context, use Transformers or vLLM.
181
 
182
  ---
183
 
184
  ## Built With This Model
185
 
186
+ **[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using this model as the teacher agent.
187
 
188
  Key integration details:
189
+ - **Teacher:** SocratTeachLLM, served via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster L40S)
190
+ - **Consultant:** GPT-4o (baseline) or Qwen3.5-9B (local variant)
191
+ - **Evaluation:** 680-dialogue test split of SocratDataset, automated with ROUGE, BLEU, and GPT-4o judge (B.5 rubric)
192
+ - **English extension:** An English translation of the training dataset is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN)
 
193
 
194
  ```bash
 
195
  hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
196
  ```
197
 
 
200
  ## Training Data
201
 
202
  | Property | Value |
203
+ |---|---|
204
+ | Dataset | [ulises-c/SocratDataset](https://huggingface.co/datasets/ulises-c/SocratDataset) |
205
  | Dialogues | 6,803 |
206
  | Turns | 42,000+ |
207
+ | Domain | Elementary school science (grades 1–6) |
208
+ | Language | Chinese (Simplified) |
209
+ | Train split | 6,123 dialogues (90%) |
210
  | Test split | 680 dialogues (10%) |
211
  | Strategies | 34 SocRule teaching strategies |
212
 
213
+ An English translation of the training data is available at [ulises-c/SocratDataset-EN](https://huggingface.co/datasets/ulises-c/SocratDataset-EN).
214
+
215
  ---
216
 
217
  ## Citation
 
219
  If you use this model, please cite the original KELE paper:
220
 
221
  ```bibtex
222
+ @inproceedings{peng-etal-2025-kele,
223
+ title = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
224
  author = {Peng, Yuan and others},
225
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
226
  year = {2025},
227
+ url = {https://aclanthology.org/2025.findings-emnlp.888/}
228
  }
229
  ```
230
 
231
  ---
232
 
233
+ ## Related Resources
234
+
235
+ | Resource | Link |
236
+ |---|---|
237
+ | KELE paper (EMNLP 2025 Findings) | https://aclanthology.org/2025.findings-emnlp.888/ |
238
+ | KELE GitHub repository | https://github.com/yuanpan1020/KELE |
239
+ | Original model | https://huggingface.co/yuanpan/SocratTeachLLM |
240
+ | Training data (Chinese) | https://huggingface.co/datasets/ulises-c/SocratDataset |
241
+ | Training data (English translation) | https://huggingface.co/datasets/ulises-c/SocratDataset-EN |
242
+ | Evaluation + inference code | https://github.com/ulises-c/csen-346 |
243
+
244
+ ---
245
+
246
  ## License
247
 
248
  [Apache 2.0](LICENSE)