---
license: apache-2.0
base_model: Qwen/Qwen3-8B
library_name: peft
tags:
- sna-learning
- lora
- education
- personalized
- mnemonic
- ml-concepts
language:
- en
pipeline_tag: text-generation
---

# Dev-the-dev91/sna-ml-adapter-v2

A LoRA adapter fine-tuned for personalized LLM and transformer concept education using mnemonic anchors derived from the learner's own Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity.

This model is part of the **SNA Learning** system, which builds a social network analysis graph from Netflix, music, and social data, identifies high-value "memory anchors" across all three modalities, and uses them to generate memorable explanations of LLM and transformer concepts.

## Model Details

- **Base model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
- **Fine-tuning method:** LoRA (Low-Rank Adaptation) via PEFT
- **LoRA rank:** 32
- **LoRA alpha:** 32
- **Target modules:** q_proj, up_proj, v_proj, gate_proj, o_proj, k_proj, down_proj, lm_head
- **Domain:** LLMs, transformers, and supporting ML foundations
- **Language:** English

## Intended Uses

This adapter supports three generation tasks:

| Task | Description |
|------|-------------|
| **explain** | Structured LLM/transformer concept explanation using personal anchors (HOOK, MOVE, BRIDGE, CONSOLIDATE format) |
| **mnemonic** | Memory device linking an LLM/transformer concept to a familiar anchor |
| **song** | Educational song/rhyme for concept retention |

**Intended audience:** Individual learner whose Netflix, music, and social data was used to build the anchor lexicon.

**Out of scope:** General-purpose question answering, factual retrieval outside the LLM/transformers domain, use with anchors from a different person's data.

## Bias, Risks, and Limitations

- **Personalized to a single user:** Anchors are derived from one person's Netflix viewing history, music listening data, and social media activity. Explanations will reference shows, movies, songs, artists, and social interactions that may not resonate with other users.
- **English only:** All training data and outputs are in English.
- **Domain-limited:** Trained on LLM/transformer concepts and supporting ML foundations (6 tiers, ~67 topics). May hallucinate outside this distribution.
- **Hallucination risk:** While grounding validation is applied during training, the model may still generate inaccurate technical claims or fabricated anchor references.

## How to Get Started

### Load with PEFT

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", device_map="auto")
model = PeftModel.from_pretrained(base, "Dev-the-dev91/sna-ml-adapter-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

messages = [
    {"role": "system", "content": "You are an LLM/transformer tutor. Use the learner's personal anchors to explain concepts."},
    {"role": "user", "content": "Explain attention mechanisms"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### API Serving

```bash
SNA_BASE_MODEL=Qwen/Qwen3-8B SNA_ADAPTER_DIR=./adapter \
  uvicorn serving.agent_api:app --host 0.0.0.0 --port 8080
```


## Training Details

### Data Pipeline

1. **Ingest** Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity; compute temporal decay weights
2. **Graph construction** using NetworkX — cross-modal edges from Netflix, music, and social nodes via Wikipedia category overlap, genre similarity, and co-occurrence
3. **Filter** via betweenness centrality, cross-modal filtering, Louvain community detection, semantic deduplication
4. **Enrich** anchors with TMDB, Wikipedia, music metadata, and LLM-generated memory hooks
5. **Export** structured chat JSONL with system/user/assistant turns

### Training Procedure

- **Stage 1 — SFT:** Supervised fine-tuning on structured explanations with `<think>` chain-of-thought blocks
- **Stage 2 — Preference tuning:** DPO/reward model on chosen vs. rejected pairs (structure-stripped, cross-row, generic rejections)
- **Stage 3 — GRPO:** Group Relative Policy Optimization for alignment

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Learning rate | 0.0001 |
| Epochs | 3.0 |
| Batch size | 1 |
| Gradient accumulation | 8 |
| Max sequence length | 4096 |
| Optimizer | AdamW |
| Quantization | QLoRA 4-bit NF4 (when CUDA available) |

### Reward Model

| Parameter | Value |
|-----------|-------|
| Learning rate | 8e-06 |
| Epochs | 1.0 |
| LoRA rank | 16 |
| Max length | 3072 |


### GRPO Alignment

| Parameter | Value |
|-----------|-------|
| Learning rate | 5e-06 |
| Epochs | 1.0 |
| Num generations | 4 |
| Max completion length | 1536 |


## Evaluation

_No evaluation results available yet. Run golden eval to populate this section:_

```bash
PYTHONPATH=. python training/train_lora.py --eval-golden data/eval/golden_eval.jsonl
```


## Technical Specifications

- **Framework:** PyTorch + HuggingFace Transformers + TRL + PEFT
- **Training library:** [TRL](https://github.com/huggingface/trl) SFTTrainer
- **Tokenization:** `apply_chat_template` (matches inference)
## Citation

```bibtex
@misc{sna-learning-dev-the-dev91-sna-ml-adapter-v2,
  title={SNA Learning: Personalized ML Education via Mnemonic Anchors},
  year={2026},
  howpublished={\url{https://huggingface.co/Dev-the-dev91/sna-ml-adapter-v2}},
}
```