Dev-the-dev91/sna-ml-adapter-v2

A LoRA adapter fine-tuned for personalized LLM and transformer concept education using mnemonic anchors derived from the learner's own Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity.

This model is part of the SNA Learning system, which builds a social network analysis graph from Netflix, music, and social data, identifies high-value "memory anchors" across all three modalities, and uses them to generate memorable explanations of LLM and transformer concepts.

Model Details

Base model: Qwen/Qwen3-8B
Fine-tuning method: LoRA (Low-Rank Adaptation) via PEFT
LoRA rank: 32
LoRA alpha: 32
Target modules: q_proj, up_proj, v_proj, gate_proj, o_proj, k_proj, down_proj, lm_head
Domain: LLMs, transformers, and supporting ML foundations
Language: English

Intended Uses

This adapter supports three generation tasks:

Task	Description
explain	Structured LLM/transformer concept explanation using personal anchors (HOOK, MOVE, BRIDGE, CONSOLIDATE format)
mnemonic	Memory device linking an LLM/transformer concept to a familiar anchor
song	Educational song/rhyme for concept retention

Intended audience: Individual learner whose Netflix, music, and social data was used to build the anchor lexicon.

Out of scope: General-purpose question answering, factual retrieval outside the LLM/transformers domain, use with anchors from a different person's data.

Bias, Risks, and Limitations

Personalized to a single user: Anchors are derived from one person's Netflix viewing history, music listening data, and social media activity. Explanations will reference shows, movies, songs, artists, and social interactions that may not resonate with other users.
English only: All training data and outputs are in English.
Domain-limited: Trained on LLM/transformer concepts and supporting ML foundations (6 tiers, ~67 topics). May hallucinate outside this distribution.
Hallucination risk: While grounding validation is applied during training, the model may still generate inaccurate technical claims or fabricated anchor references.

How to Get Started

Load with PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", device_map="auto")
model = PeftModel.from_pretrained(base, "Dev-the-dev91/sna-ml-adapter-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

messages = [
    {"role": "system", "content": "You are an LLM/transformer tutor. Use the learner's personal anchors to explain concepts."},
    {"role": "user", "content": "Explain attention mechanisms"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

API Serving

SNA_BASE_MODEL=Qwen/Qwen3-8B SNA_ADAPTER_DIR=./adapter \
  uvicorn serving.agent_api:app --host 0.0.0.0 --port 8080

Training Details

Data Pipeline

Ingest Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity; compute temporal decay weights
Graph construction using NetworkX — cross-modal edges from Netflix, music, and social nodes via Wikipedia category overlap, genre similarity, and co-occurrence
Filter via betweenness centrality, cross-modal filtering, Louvain community detection, semantic deduplication
Enrich anchors with TMDB, Wikipedia, music metadata, and LLM-generated memory hooks
Export structured chat JSONL with system/user/assistant turns

Training Procedure

Stage 1 — SFT: Supervised fine-tuning on structured explanations with <think> chain-of-thought blocks
Stage 2 — Preference tuning: DPO/reward model on chosen vs. rejected pairs (structure-stripped, cross-row, generic rejections)
Stage 3 — GRPO: Group Relative Policy Optimization for alignment

Hyperparameters

Parameter	Value
LoRA rank	32
LoRA alpha	32
Learning rate	0.0001
Epochs	3.0
Batch size	1
Gradient accumulation	8
Max sequence length	4096
Optimizer	AdamW
Quantization	QLoRA 4-bit NF4 (when CUDA available)

Reward Model

Parameter	Value
Learning rate	8e-06
Epochs	1.0
LoRA rank	16
Max length	3072

GRPO Alignment

Parameter	Value
Learning rate	5e-06
Epochs	1.0
Num generations	4
Max completion length	1536

Evaluation

No evaluation results available yet. Run golden eval to populate this section:

PYTHONPATH=. python training/train_lora.py --eval-golden data/eval/golden_eval.jsonl

Technical Specifications

Framework: PyTorch + HuggingFace Transformers + TRL + PEFT
Training library: TRL SFTTrainer
Tokenization: apply_chat_template (matches inference)

Citation

@misc{sna-learning-dev-the-dev91-sna-ml-adapter-v2,
  title={SNA Learning: Personalized ML Education via Mnemonic Anchors},
  year={2026},
  howpublished={\url{https://huggingface.co/Dev-the-dev91/sna-ml-adapter-v2}},
}

Downloads last month: 23

Model tree for Dev-the-dev91/sna-ml-adapter-v2

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1291)

this model