Instructions to use ulises-c/SocratesLM-31B-stage2b-QLoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ulises-c/SocratesLM-31B-stage2b-QLoRA with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
SocratesLM-31B-stage2b-QLoRA
Stage 2b SFT checkpoint β Gemma 4 31B QLoRA fine-tuned for Socratic teaching dialogue (MELE project).
This is a LoRA adapter trained on the Socratic teaching SFT task (Stage 2b: structural, state-conditioned teacher responses) as part of the MELE project β a reproduction and extension of the KELE multi-agent Socratic teaching framework. The base model is google/gemma-4-31b-it loaded via the unsloth NF4 community checkpoint.
Status: training checkpoint / proof-of-concept. This repo contains intermediate checkpoints from an in-progress training run (step 1230 and 1220, ~25% of epoch 1). Training is stalled pending an upstream rocBLAS fix for gfx1201 (RDNA4) β ROCm/rocm-libraries #7992. Checkpoints are published here primarily for backup and reproducibility.
MELE vs. KELE: MELE replaces KELE's per-turn LLM-consultant bottleneck with a fast deterministic state classifier (
ulises-c/socrates-state-classifier-qwen3.5-lora) and extends the pipeline to Socratic SFT of a large generative model β something the original KELE paper did not attempt. The training code and pipeline are at github.com/ulises-c/csen-346.
Built as part of CSEN 346 (Natural Language Processing), Santa Clara University, 2026.
Model Summary
| Property | Value |
|---|---|
| Base model | google/gemma-4-31b-it (Gemma ToS) |
| Quantization | NF4 (bitsandbytes), loaded via unsloth community checkpoint |
| Method | QLoRA β LoRA r=16, Ξ±=32, dropout=0.05 |
| Target modules | All attention + MLP projections: q/k/v/o_proj, gate/up/down_proj |
| Task | Socratic teacher response generation (state-conditioned SFT) |
| Stage | 2b only β structural responses with state conditioning |
| Training data | SocratDataset-EN + SocratDataset (EN + ZH, 77,202 train / 8,578 eval) |
| Epochs planned | 1 |
| Steps completed | 1230 / |
| Loss at step 1230 | 0.6465 |
| Token accuracy at step 1230 | 0.798 |
| Hardware | AMD Radeon AI PRO R9700 (gfx1201 / RDNA4, 32 GB), ROCm 7.2 |
Checkpoints in This Repo
See SFT_STAGE2B_CHECKPOINT_LOG.md for the full, auto-updated checkpoint table (one row per pushed checkpoint β every 50 steps). For a more detailed live training log including loss, grad norm, lr, token accuracy per checkpoint, and crash/resume events, see csen-346 issue #120.
Each checkpoint directory includes the LoRA adapter weights (adapter_model.safetensors), optimizer and scheduler states (optimizer.pt, scheduler.pt), and all files needed to resume training.
Training Setup
# Effective batch size: 16 (batch_size=1, grad_accum=16)
# Learning rate: 5e-5
# Sequence length: see configs/train-sft-stage2-gemma4-31b.env
# Backend: TORCH_USE_HIPBLASLT=0 (rocBLAS Tensile path)
make train-gemma4-31b-stage2-unsloth
Full training configuration and scripts are at github.com/ulises-c/csen-346.
Why training is paused
A reproducible GPU page fault (Memory access fault β¦ Page not present) occurs in the rocBLAS Tensile GEMM kernel (ISA1201 MT64x64x64 DTVB1) during the QLoRA backward pass on gfx1201/RDNA4. The bug has been reported upstream at ROCm/rocm-libraries #7992. Training resumes via checkpoint crawl (saves every 10 steps) but stalls when the GPU requires a driver reset. See csen-346 issue #113 for the full diagnostic.
How to Use
Loading requires the base model in NF4 format plus the LoRA adapter. The simplest path uses the unsloth community checkpoint as the base:
import torch
from transformers import AutoTokenizer
from peft import PeftModel
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base = AutoModelForCausalLM.from_pretrained(
"unsloth/gemma-4-31B-it-unsloth-bnb-4bit",
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"ulises-c/SocratesLM-31B-stage2b-QLoRA",
subfolder="checkpoint-1230",
)
model = PeftModel.from_pretrained(base, "ulises-c/SocratesLM-31B-stage2b-QLoRA",
subfolder="checkpoint-1230")
Note: This checkpoint is at ~25% of epoch 1 and has not converged β inference quality will be limited. It is published primarily as a training backup.
Stage 2b Context (MELE Pipeline)
The MELE pipeline extends KELE with three stages of SFT:
| Stage | Description | Status |
|---|---|---|
| 1 | General instruction following warm-up | Not trained (skipped as PoC) |
| 2a | Breadth β all Socratic responses, no state conditioning | Skipped as PoC |
| 2b | Structural β state-conditioned teacher responses | This repo (in progress) |
Stage 2b fine-tunes the model to generate teacher turns conditioned on the active SocRule state (provided by the state classifier at inference time), producing structurally correct Socratic dialogue without needing an LLM consultant at every turn.
Citation
If you use this checkpoint, cite the KELE paper and this work:
@inproceedings{peng-etal-2025-kele,
title = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
author = {Peng, Yuan and others},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
year = {2025},
url = {https://aclanthology.org/2025.findings-emnlp.888/}
}
@misc{chavarria-2026-socratesLM-stage2b,
title = {{SocratesLM-31B-stage2b-QLoRA}: Stage 2b {QLoRA} Checkpoint for Socratic Teaching ({MELE})},
author = {Chavarria, Ulises},
year = {2026},
url = {https://huggingface.co/ulises-c/SocratesLM-31B-stage2b-QLoRA}
}
Related Resources
| Resource | Link |
|---|---|
| KELE paper (EMNLP 2025 Findings) | https://aclanthology.org/2025.findings-emnlp.888/ |
| KELE GitHub | https://github.com/yuanpan1020/KELE |
| SocratTeachLLM (original KELE model) | https://huggingface.co/yuanpan/SocratTeachLLM |
| State classifier (MELE consultant) | https://huggingface.co/ulises-c/socrates-state-classifier-qwen3.5-lora |
| Training dataset β SocratDataset-EN | https://huggingface.co/datasets/ulises-c/SocratDataset-EN |
| Training dataset β SocratDataset (ZH) | https://huggingface.co/datasets/ulises-c/SocratDataset |
| Training + pipeline code | https://github.com/ulises-c/csen-346 |
| rocBLAS gfx1201 bug (upstream) | https://github.com/ROCm/rocm-libraries/issues/7992 |
License
The adapter weights are released under the Gemma Terms of Use, inherited from the base model. Use of this checkpoint must also cite the KELE paper, whose SocRule framework defines the training objective.
- Downloads last month
- -