SocratesLM-31B-stage2b-QLoRA

Stage 2b SFT checkpoint — Gemma 4 31B QLoRA fine-tuned for Socratic teaching dialogue (MELE project).

This is a LoRA adapter trained on the Socratic teaching SFT task (Stage 2b: structural, state-conditioned teacher responses) as part of the MELE project — a reproduction and extension of the KELE multi-agent Socratic teaching framework. The base model is google/gemma-4-31b-it loaded via the unsloth NF4 community checkpoint.

Status: training checkpoint / proof-of-concept. This repo contains intermediate checkpoints from an in-progress training run (step 1230 and 1220, ~25% of epoch 1). Training is stalled pending an upstream rocBLAS fix for gfx1201 (RDNA4) — ROCm/rocm-libraries #7992. Checkpoints are published here primarily for backup and reproducibility.

MELE vs. KELE: MELE replaces KELE's per-turn LLM-consultant bottleneck with a fast deterministic state classifier (ulises-c/socrates-state-classifier-qwen3.5-lora) and extends the pipeline to Socratic SFT of a large generative model — something the original KELE paper did not attempt. The training code and pipeline are at github.com/ulises-c/csen-346.

Built as part of CSEN 346 (Natural Language Processing), Santa Clara University, 2026.

Model Summary

Property	Value
Base model	`google/gemma-4-31b-it` (Gemma ToS)
Quantization	NF4 (bitsandbytes), loaded via unsloth community checkpoint
Method	QLoRA — LoRA r=16, α=32, dropout=0.05
Target modules	All attention + MLP projections: `q/k/v/o_proj`, `gate/up/down_proj`
Task	Socratic teacher response generation (state-conditioned SFT)
Stage	2b only — structural responses with state conditioning
Training data	`SocratDataset-EN` + `SocratDataset` (EN + ZH, 77,202 train / 8,578 eval)
Epochs planned	1
Steps completed	1230 / ~~4826 (~~25.5% of epoch 1)
Loss at step 1230	0.6465
Token accuracy at step 1230	0.798
Hardware	AMD Radeon AI PRO R9700 (gfx1201 / RDNA4, 32 GB), ROCm 7.2

Checkpoints in This Repo

See SFT_STAGE2B_CHECKPOINT_LOG.md for the full, auto-updated checkpoint table (one row per pushed checkpoint — every 50 steps). For a more detailed live training log including loss, grad norm, lr, token accuracy per checkpoint, and crash/resume events, see csen-346 issue #120.

Each checkpoint directory includes the LoRA adapter weights (adapter_model.safetensors), optimizer and scheduler states (optimizer.pt, scheduler.pt), and all files needed to resume training.

Training Setup

# Effective batch size: 16 (batch_size=1, grad_accum=16)
# Learning rate: 5e-5
# Sequence length: see configs/train-sft-stage2-gemma4-31b.env
# Backend: TORCH_USE_HIPBLASLT=0 (rocBLAS Tensile path)
make train-gemma4-31b-stage2-unsloth

Full training configuration and scripts are at github.com/ulises-c/csen-346.

Why training is paused

A reproducible GPU page fault (Memory access fault … Page not present) occurs in the rocBLAS Tensile GEMM kernel (ISA1201 MT64x64x64 DTVB1) during the QLoRA backward pass on gfx1201/RDNA4. The bug has been reported upstream at ROCm/rocm-libraries #7992. Training resumes via checkpoint crawl (saves every 10 steps) but stalls when the GPU requires a driver reset. See csen-346 issue #113 for the full diagnostic.

How to Use

Loading requires the base model in NF4 format plus the LoRA adapter. The simplest path uses the unsloth community checkpoint as the base:

import torch
from transformers import AutoTokenizer
from peft import PeftModel
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base = AutoModelForCausalLM.from_pretrained(
    "unsloth/gemma-4-31B-it-unsloth-bnb-4bit",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "ulises-c/SocratesLM-31B-stage2b-QLoRA",
    subfolder="checkpoint-1230",
)
model = PeftModel.from_pretrained(base, "ulises-c/SocratesLM-31B-stage2b-QLoRA",
                                  subfolder="checkpoint-1230")

Note: This checkpoint is at ~25% of epoch 1 and has not converged — inference quality will be limited. It is published primarily as a training backup.

Stage 2b Context (MELE Pipeline)

The MELE pipeline extends KELE with three stages of SFT:

Stage	Description	Status
1	General instruction following warm-up	Not trained (skipped as PoC)
2a	Breadth — all Socratic responses, no state conditioning	Skipped as PoC
2b	Structural — state-conditioned teacher responses	This repo (in progress)

Stage 2b fine-tunes the model to generate teacher turns conditioned on the active SocRule state (provided by the state classifier at inference time), producing structurally correct Socratic dialogue without needing an LLM consultant at every turn.

Citation

If you use this checkpoint, cite the KELE paper and this work:

@inproceedings{peng-etal-2025-kele,
  title     = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
  author    = {Peng, Yuan and others},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
  year      = {2025},
  url       = {https://aclanthology.org/2025.findings-emnlp.888/}
}

@misc{chavarria-2026-socratesLM-stage2b,
  title  = {{SocratesLM-31B-stage2b-QLoRA}: Stage 2b {QLoRA} Checkpoint for Socratic Teaching ({MELE})},
  author = {Chavarria, Ulises},
  year   = {2026},
  url    = {https://huggingface.co/ulises-c/SocratesLM-31B-stage2b-QLoRA}
}

Related Resources

Resource	Link
KELE paper (EMNLP 2025 Findings)	https://aclanthology.org/2025.findings-emnlp.888/
KELE GitHub	https://github.com/yuanpan1020/KELE
SocratTeachLLM (original KELE model)	https://huggingface.co/yuanpan/SocratTeachLLM
State classifier (MELE consultant)	https://huggingface.co/ulises-c/socrates-state-classifier-qwen3.5-lora
Training dataset — SocratDataset-EN	https://huggingface.co/datasets/ulises-c/SocratDataset-EN
Training dataset — SocratDataset (ZH)	https://huggingface.co/datasets/ulises-c/SocratDataset
Training + pipeline code	https://github.com/ulises-c/csen-346
rocBLAS gfx1201 bug (upstream)	https://github.com/ROCm/rocm-libraries/issues/7992

License

The adapter weights are released under the Gemma Terms of Use, inherited from the base model. Use of this checkpoint must also cite the KELE paper, whose SocRule framework defines the training objective.

Downloads last month: -

Datasets used to train ulises-c/SocratesLM-31B-stage2b-QLoRA

Collection including ulises-c/SocratesLM-31B-stage2b-QLoRA

KELE-v2

Collection

SCU CSEN-346 NLP Project • 6 items • Updated 3 days ago