Add comprehensive README documenting model architecture and usage

Replaces the minimal HuggingFace metadata stub with a full model card
covering architecture specs, Transformers/Ollama/vLLM usage, the KELE
framework context, training dataset details, and the csen-346 downstream
project that builds on this model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

README.md +182 -1

README.md CHANGED Viewed

@@ -2,4 +2,185 @@
 license: apache-2.0
 language:
 - zh
----

 license: apache-2.0
 language:
 - zh
+---
+# SocratTeachLLM
+A fine-tuned [GLM4-9B-Chat](https://huggingface.co/THUDM/glm-4-9b-chat) model trained to act as a **Socratic teacher** in structured educational dialogues. It generates heuristic questions and formative feedback that guide students through a principled sequence of reasoning stages, following the [KELE framework](https://aclanthology.org/2025.findings-emnlp.XXX) (Peng et al., EMNLP 2025 Findings).
+> **Original model:** [yuanpan/SocratTeachLLM](https://huggingface.co/yuanpan/SocratTeachLLM)
+---
+## What It Does
+SocratTeachLLM is designed for the **teacher role** in a dual-agent Socratic tutoring system. A separate consultant agent (e.g., GPT-4o) selects a teaching strategy from a predefined set of 34 Socratic rules (SocRule); SocratTeachLLM then generates the actual dialogue turn implementing that strategy.
+Teaching proceeds through five stages:
+| Stage | Name | Description |
+|-------|------|-------------|
+| A | Student Questioning | Elicit prior knowledge and surface misconceptions |
+| B | Concept Probing | Probe understanding of core concepts |
+| C | Inductive Reasoning | Guide the student toward generalizations |
+| D | Rule Construction | Help the student articulate a principle or rule |
+| E | Summary | Consolidate and reinforce learning |
+The model was fine-tuned (LoRA) on **SocratDataset**: 6,803 multi-turn Socratic dialogues covering 42,000+ interaction turns across elementary school science topics, primarily in Chinese.
+---
+## Model Architecture
+| Parameter | Value |
+|-----------|-------|
+| Base model | GLM4-9B-Chat (`ChatGLMForConditionalGeneration`) |
+| Layers | 40 |
+| Hidden size | 4,096 |
+| Attention heads | 32 |
+| FFN hidden size | 13,696 |
+| KV channels | 128 |
+| Vocabulary size | 151,552 |
+| Max context length | 131,072 tokens (128K) |
+| Storage dtype | bfloat16 |
+| Attention | Multi-query (2 groups), RoPE (ratio 500) |
+| Normalization | RMSNorm, post-layer-norm |
+| Total parameters | ~9.4B |
+| Weight files | 4 × safetensors shards (~18.8 GB total) |
+**Generation defaults:** temperature 0.8, top-p 0.8, max length 128K.
+---
+## Loading with Transformers
+The model uses custom modeling code, so `trust_remote_code=True` is required.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "ulises-c/SocratTeachLLM"
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+messages = [
+    {"role": "user", "content": "What do you think causes the seasons to change?"}
+]
+inputs = tokenizer.apply_chat_template(
+    messages, add_generation_prompt=True, return_tensors="pt"
+).to(model.device)
+outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.8)
+print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
+```
+### Low-VRAM (4-bit NF4 via bitsandbytes, ~6.5 GB)
+```python
+from transformers import BitsAndBytesConfig
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    quantization_config=bnb_config,
+    device_map="auto",
+    trust_remote_code=True,
+)
+```
+---
+## Running Locally with Ollama
+This repo includes a `Modelfile` for Ollama (auto-generated by LlamaFactory). It sets a 4,096-token context window and the correct stop sequences for the ChatGLM4 chat format.
+```bash
+# Create the Ollama model from the local Modelfile
+ollama create SocratTeachLLM -f Modelfile
+# Run interactively
+ollama run SocratTeachLLM
+```
+Stop sequences used: `<|user|>`, `<|endoftext|>`, `<|observation|>`.
+> **Note:** Ollama currently caps the context at 4,096 tokens. For the full 128K context, use the Transformers or vLLM path.
+### vLLM (full bfloat16, ~19 GB VRAM)
+```bash
+vllm serve /path/to/SocratTeachLLM \
+  --served-model-name SocratTeachLLM \
+  --dtype bfloat16 \
+  --trust-remote-code
+```
+This exposes an OpenAI-compatible endpoint at `http://localhost:8000/v1`.
+---
+## Built With This Model
+**[csen-346](https://github.com/ulises-c/csen-346)** is a downstream course project (CSEN 346 NLP, Santa Clara University) that reproduces and extends the KELE framework using SocratTeachLLM as the teacher agent.
+Key integration details:
+- **Teacher agent:** SocratTeachLLM, served locally via FastAPI (4-bit on RTX 3070) or vLLM (bfloat16 on RTX 5090 / SCU WAVE cluster)
+- **Consultant agent:** GPT-4o (baseline) or Qwen3.5-9B (local variant) — selects Socratic strategies from SocRule and passes them to the teacher
+- **Evaluation:** 680-dialogue test split of SocratDataset
+- **API surface:** OpenAI-compatible chat completions endpoint (`TEACHER_MODEL_NAME=SocratTeachLLM`)
+```bash
+# Download the model for use in csen-346
+hf download ulises-c/SocratTeachLLM --local-dir ~/hf_models/SocratTeachLLM
+```
+---
+## Training Data
+| Property | Value |
+|----------|-------|
+| Dataset | SocratDataset |
+| Dialogues | 6,803 |
+| Turns | 42,000+ |
+| Domain | Elementary school science |
+| Language | Primarily Chinese |
+| Train split | 5,723 dialogues (90%) |
+| Test split | 680 dialogues (10%) |
+| Strategies | 34 SocRule teaching strategies |
+---
+## Citation
+If you use this model, please cite the original KELE paper:
+```bibtex
+@inproceedings{peng2025kele,
+  title     = {KELE: A Multi-Agent Framework for Structured Socratic Teaching with Large Language Models},
+  author    = {Peng, Yuan and others},
+  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
+  year      = {2025},
+}
+```
+---
+## License
+[Apache 2.0](LICENSE)