ulises-c Claude Sonnet 4.6 commited on
Commit
6c8c78e
·
verified ·
1 Parent(s): 0ead800

Add comprehensive README documenting model architecture and usage

Browse files

Replaces the minimal HuggingFace metadata stub with a full model card
covering architecture specs, Transformers/Ollama/vLLM usage, the KELE
framework context, training dataset details, and the csen-346 downstream
project that builds on this model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +182 -1
README.md CHANGED
@@ -2,4 +2,185 @@
2
  license: apache-2.0
3
  language:
4
  - zh
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  language:
4
  - zh
5
+ ---
6
+
7
+ # SocratTeachLLM
8
+
9
+ A fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.XXX) (Peng et al., EMNLP 2025 Findings).
10
+
11
+ > **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM)
12
+
13
+ ---
14
+
15
+ ## What It Does
16
+
17
+ SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate consultant agent (e.g., GPT-4o) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
18
+
19
+ Teaching proceeds through five stages:
20
+
21
+ | Stage | Name | Description |
22
+ |-------|------|-------------|
23
+ | A | Student Questioning | Elicit prior knowledge and surface misconceptions |
24
+ | B | Concept Probing | Probe understanding of core concepts |
25
+ | C | Inductive Reasoning | Guide the student toward generalizations |
26
+ | D | Rule Construction | Help the student articulate a principle or rule |
27
+ | E | Summary | Consolidate and reinforce learning |
28
+
29
+ The model was fine-tuned (LoRA) on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics, primarily in Chinese.
30
+
31
+ ---
32
+
33
+ ## Model Architecture
34
+
35
+ | Parameter | Value |
36
+ |-----------|-------|
37
+ | Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
38
+ | Layers | 40 |
39
+ | Hidden size | 4,096 |
40
+ | Attention heads | 32 |
41
+ | FFN hidden size | 13,696 |
42
+ | KV channels | 128 |
43
+ | Vocabulary size | 151,552 |
44
+ | Max context length | 131,072 tokens (128K) |
45
+ | Storage dtype | bfloat16 |
46
+ | Attention | Multi-query (2 groups), RoPE (ratio 500) |
47
+ | Normalization | RMSNorm, post-layer-norm |
48
+ | Total parameters | ~9.4B |
49
+ | Weight files | 4 × safetensors shards (~18.8 GB total) |
50
+
51
+ **Generation defaults:** temperature 0.8, top-p 0.8, max length 128K.
52
+
53
+ ---
54
+
55
+ ## Loading with Transformers
56
+
57
+ The model uses custom modeling code, so `trust_remote_code=True` is required.
58
+
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForCausalLM
61
+ import torch
62
+
63
+ model_id = "ulises-c/SocratTeachLLM"
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
66
+ model = AutoModelForCausalLM.from_pretrained(
67
+ model_id,
68
+ torch_dtype=torch.bfloat16,
69
+ device_map="auto",
70
+ trust_remote_code=True,
71
+ )
72
+
73
+ messages = [
74
+ {"role": "user", "content": "What do you think causes the seasons to change?"}
75
+ ]
76
+
77
+ inputs = tokenizer.apply_chat_template(
78
+ messages, add_generation_prompt=True, return_tensors="pt"
79
+ ).to(model.device)
80
+
81
+ outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.8)
82
+ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
83
+ ```
84
+
85
+ ### Low-VRAM (4-bit NF4 via bitsandbytes, ~6.5 GB)
86
+
87
+ ```python
88
+ from transformers import BitsAndBytesConfig
89
+
90
+ bnb_config = BitsAndBytesConfig(
91
+ load_in_4bit=True,
92
+ bnb_4bit_compute_dtype=torch.float16,
93
+ bnb_4bit_use_double_quant=True,
94
+ bnb_4bit_quant_type="nf4",
95
+ )
96
+
97
+ model = AutoModelForCausalLM.from_pretrained(
98
+ model_id,
99
+ quantization_config=bnb_config,
100
+ device_map="auto",
101
+ trust_remote_code=True,
102
+ )
103
+ ```
104
+
105
+ ---
106
+
107
+ ## Running Locally with Ollama
108
+
109
+ This repo includes a `Modelfile` for Ollama (auto-generated by LlamaFactory). It sets a 4,096-token context window and the correct stop sequences for the ChatGLM4 chat format.
110
+
111
+ ```bash
112
+ # Create the Ollama model from the local Modelfile
113
+ ollama create SocratTeachLLM -f Modelfile
114
+
115
+ # Run interactively
116
+ ollama run SocratTeachLLM
117
+ ```
118
+
119
+ Stop sequences used: `<|user|>`, `<|endoftext|>`, `<|observation|>`.
120
+
121
+ > **Note:** Ollama currently caps the context at 4,096 tokens. For the full 128K context, use the Transformers or vLLM path.
122
+
123
+ ### vLLM (full bfloat16, ~19 GB VRAM)
124
+
125
+ ```bash
126
+ vllm serve /path/to/SocratTeachLLM \
127
+ --served-model-name SocratTeachLLM \
128
+ --dtype bfloat16 \
129
+ --trust-remote-code
130
+ ```
131
+
132
+ This exposes an OpenAI-compatible endpoint at `http://localhost:8000/v1`.
133
+
134
+ ---
135
+
136
+ ## Built With This Model
137
+
138
+ **[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using SocratTeachLLM as the teacher agent.
139
+
140
+ Key integration details:
141
+
142
+ - **Teacher agent:** SocratTeachLLM, served locally via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster)
143
+ - **Consultant agent:** GPT-4o (baseline) or Qwen3.5-9B (local variant) — selects Socratic strategies from SocRule and passes them to the teacher
144
+ - **Evaluation:** 680-dialogue test split of SocratDataset
145
+ - **API surface:** OpenAI-compatible chat completions endpoint (`TEACHER_MODEL_NAME=SocratTeachLLM`)
146
+
147
+ ```bash
148
+ # Download the model for use in csen-346
149
+ hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Training Data
155
+
156
+ | Property | Value |
157
+ |----------|-------|
158
+ | Dataset | SocratDataset |
159
+ | Dialogues | 6,803 |
160
+ | Turns | 42,000+ |
161
+ | Domain | Elementary school science |
162
+ | Language | Primarily Chinese |
163
+ | Train split | 5,723 dialogues (90%) |
164
+ | Test split | 680 dialogues (10%) |
165
+ | Strategies | 34 SocRule teaching strategies |
166
+
167
+ ---
168
+
169
+ ## Citation
170
+
171
+ If you use this model, please cite the original KELE paper:
172
+
173
+ ```bibtex
174
+ @inproceedings{peng2025kele,
175
+ title = {KELE: A Multi-Agent Framework for Structured Socratic Teaching with Large Language Models},
176
+ author = {Peng, Yuan and others},
177
+ booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
178
+ year = {2025},
179
+ }
180
+ ```
181
+
182
+ ---
183
+
184
+ ## License
185
+
186
+ [Apache 2.0](LICENSE)