Dev-the-dev91/sna-ml-adapter-v2

A LoRA adapter fine-tuned for personalized LLM and transformer concept education using mnemonic anchors derived from the learner's own Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity.

This model is part of the SNA Learning system, which builds a social network analysis graph from Netflix, music, and social data, identifies high-value "memory anchors" across all three modalities, and uses them to generate memorable explanations of LLM and transformer concepts.

Model Details

  • Base model: Qwen/Qwen3-8B
  • Fine-tuning method: LoRA (Low-Rank Adaptation) via PEFT
  • LoRA rank: 32
  • LoRA alpha: 32
  • Target modules: q_proj, up_proj, v_proj, gate_proj, o_proj, k_proj, down_proj, lm_head
  • Domain: LLMs, transformers, and supporting ML foundations
  • Language: English

Intended Uses

This adapter supports three generation tasks:

Task Description
explain Structured LLM/transformer concept explanation using personal anchors (HOOK, MOVE, BRIDGE, CONSOLIDATE format)
mnemonic Memory device linking an LLM/transformer concept to a familiar anchor
song Educational song/rhyme for concept retention

Intended audience: Individual learner whose Netflix, music, and social data was used to build the anchor lexicon.

Out of scope: General-purpose question answering, factual retrieval outside the LLM/transformers domain, use with anchors from a different person's data.

Bias, Risks, and Limitations

  • Personalized to a single user: Anchors are derived from one person's Netflix viewing history, music listening data, and social media activity. Explanations will reference shows, movies, songs, artists, and social interactions that may not resonate with other users.
  • English only: All training data and outputs are in English.
  • Domain-limited: Trained on LLM/transformer concepts and supporting ML foundations (6 tiers, ~67 topics). May hallucinate outside this distribution.
  • Hallucination risk: While grounding validation is applied during training, the model may still generate inaccurate technical claims or fabricated anchor references.

How to Get Started

Load with PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", device_map="auto")
model = PeftModel.from_pretrained(base, "Dev-the-dev91/sna-ml-adapter-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

messages = [
    {"role": "system", "content": "You are an LLM/transformer tutor. Use the learner's personal anchors to explain concepts."},
    {"role": "user", "content": "Explain attention mechanisms"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

API Serving

SNA_BASE_MODEL=Qwen/Qwen3-8B SNA_ADAPTER_DIR=./adapter \
  uvicorn serving.agent_api:app --host 0.0.0.0 --port 8080

Training Details

Data Pipeline

  1. Ingest Netflix viewing history, music listening data (Spotify/Apple Music), and social media activity; compute temporal decay weights
  2. Graph construction using NetworkX — cross-modal edges from Netflix, music, and social nodes via Wikipedia category overlap, genre similarity, and co-occurrence
  3. Filter via betweenness centrality, cross-modal filtering, Louvain community detection, semantic deduplication
  4. Enrich anchors with TMDB, Wikipedia, music metadata, and LLM-generated memory hooks
  5. Export structured chat JSONL with system/user/assistant turns

Training Procedure

  • Stage 1 — SFT: Supervised fine-tuning on structured explanations with <think> chain-of-thought blocks
  • Stage 2 — Preference tuning: DPO/reward model on chosen vs. rejected pairs (structure-stripped, cross-row, generic rejections)
  • Stage 3 — GRPO: Group Relative Policy Optimization for alignment

Hyperparameters

Parameter Value
LoRA rank 32
LoRA alpha 32
Learning rate 0.0001
Epochs 3.0
Batch size 1
Gradient accumulation 8
Max sequence length 4096
Optimizer AdamW
Quantization QLoRA 4-bit NF4 (when CUDA available)

Reward Model

Parameter Value
Learning rate 8e-06
Epochs 1.0
LoRA rank 16
Max length 3072

GRPO Alignment

Parameter Value
Learning rate 5e-06
Epochs 1.0
Num generations 4
Max completion length 1536

Evaluation

No evaluation results available yet. Run golden eval to populate this section:

PYTHONPATH=. python training/train_lora.py --eval-golden data/eval/golden_eval.jsonl

Technical Specifications

  • Framework: PyTorch + HuggingFace Transformers + TRL + PEFT
  • Training library: TRL SFTTrainer
  • Tokenization: apply_chat_template (matches inference)

Citation

@misc{sna-learning-dev-the-dev91-sna-ml-adapter-v2,
  title={SNA Learning: Personalized ML Education via Mnemonic Anchors},
  year={2026},
  howpublished={\url{https://huggingface.co/Dev-the-dev91/sna-ml-adapter-v2}},
}
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dev-the-dev91/sna-ml-adapter-v2

Finetuned
Qwen/Qwen3-8B
Adapter
(1291)
this model