Dev-the-dev91/sna-ml-adapter-v2
A LoRA adapter fine-tuned for personalized LLM and transformer concept education using mnemonic anchors derived from the learner's own Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity.
This model is part of the SNA Learning system, which builds a social network analysis graph from Netflix, music, and social data, identifies high-value "memory anchors" across all three modalities, and uses them to generate memorable explanations of LLM and transformer concepts.
Model Details
- Base model: Qwen/Qwen3-8B
- Fine-tuning method: LoRA (Low-Rank Adaptation) via PEFT
- LoRA rank: 32
- LoRA alpha: 32
- Target modules: q_proj, up_proj, v_proj, gate_proj, o_proj, k_proj, down_proj, lm_head
- Domain: LLMs, transformers, and supporting ML foundations
- Language: English
Intended Uses
This adapter supports three generation tasks:
| Task | Description |
|---|---|
| explain | Structured LLM/transformer concept explanation using personal anchors (HOOK, MOVE, BRIDGE, CONSOLIDATE format) |
| mnemonic | Memory device linking an LLM/transformer concept to a familiar anchor |
| song | Educational song/rhyme for concept retention |
Intended audience: Individual learner whose Netflix, music, and social data was used to build the anchor lexicon.
Out of scope: General-purpose question answering, factual retrieval outside the LLM/transformers domain, use with anchors from a different person's data.
Bias, Risks, and Limitations
- Personalized to a single user: Anchors are derived from one person's Netflix viewing history, music listening data, and social media activity. Explanations will reference shows, movies, songs, artists, and social interactions that may not resonate with other users.
- English only: All training data and outputs are in English.
- Domain-limited: Trained on LLM/transformer concepts and supporting ML foundations (6 tiers, ~67 topics). May hallucinate outside this distribution.
- Hallucination risk: While grounding validation is applied during training, the model may still generate inaccurate technical claims or fabricated anchor references.
How to Get Started
Load with PEFT
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", device_map="auto")
model = PeftModel.from_pretrained(base, "Dev-the-dev91/sna-ml-adapter-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
messages = [
{"role": "system", "content": "You are an LLM/transformer tutor. Use the learner's personal anchors to explain concepts."},
{"role": "user", "content": "Explain attention mechanisms"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
API Serving
SNA_BASE_MODEL=Qwen/Qwen3-8B SNA_ADAPTER_DIR=./adapter \
uvicorn serving.agent_api:app --host 0.0.0.0 --port 8080
Training Details
Data Pipeline
- Ingest Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity; compute temporal decay weights
- Graph construction using NetworkX — cross-modal edges from Netflix, music, and social nodes via Wikipedia category overlap, genre similarity, and co-occurrence
- Filter via betweenness centrality, cross-modal filtering, Louvain community detection, semantic deduplication
- Enrich anchors with TMDB, Wikipedia, music metadata, and LLM-generated memory hooks
- Export structured chat JSONL with system/user/assistant turns
Training Procedure
- Stage 1 — SFT: Supervised fine-tuning on structured explanations with
<think>chain-of-thought blocks - Stage 2 — Preference tuning: DPO/reward model on chosen vs. rejected pairs (structure-stripped, cross-row, generic rejections)
- Stage 3 — GRPO: Group Relative Policy Optimization for alignment
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Learning rate | 0.0001 |
| Epochs | 3.0 |
| Batch size | 1 |
| Gradient accumulation | 8 |
| Max sequence length | 4096 |
| Optimizer | AdamW |
| Quantization | QLoRA 4-bit NF4 (when CUDA available) |
Reward Model
| Parameter | Value |
|---|---|
| Learning rate | 8e-06 |
| Epochs | 1.0 |
| LoRA rank | 16 |
| Max length | 3072 |
GRPO Alignment
| Parameter | Value |
|---|---|
| Learning rate | 5e-06 |
| Epochs | 1.0 |
| Num generations | 4 |
| Max completion length | 1536 |
Evaluation
No evaluation results available yet. Run golden eval to populate this section:
PYTHONPATH=. python training/train_lora.py --eval-golden data/eval/golden_eval.jsonl
Technical Specifications
- Framework: PyTorch + HuggingFace Transformers + TRL + PEFT
- Training library: TRL SFTTrainer
- Tokenization:
apply_chat_template(matches inference)
Citation
@misc{sna-learning-dev-the-dev91-sna-ml-adapter-v2,
title={SNA Learning: Personalized ML Education via Mnemonic Anchors},
year={2026},
howpublished={\url{https://huggingface.co/Dev-the-dev91/sna-ml-adapter-v2}},
}
- Downloads last month
- 23