# Code Sources & References

Every code snippet, technique, and configuration used in this project traced back to its original source.
Use this when writing your paper to cite where each technique came from.

---

## 1. Liquid AI — Model & Architecture

### LFM2.5-1.2B-Instruct (Our Model)
```python
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")
```
- **What:** 1.2 billion parameter instruction-tuned language model
- **HuggingFace:** https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
- **Company:** https://www.liquid.ai/
- **Architecture:** Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans)
- **Paper:** arXiv:2511.23404 — LFM2 technical report
- **Why we use it:** Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible

### Liquid AI Official Documentation
- **Main docs:** https://docs.liquid.ai
- **Transformers inference guide:** https://docs.liquid.ai/deployment/gpu-inference/transformers
- **Fine-tuning with TRL:** https://docs.liquid.ai/customization/finetuning-frameworks/trl
- **Fine-tuning with Unsloth:** https://docs.liquid.ai/customization/finetuning-frameworks/unsloth
- **Dataset formats:** https://docs.liquid.ai/customization/finetuning-frameworks/datasets
- **Customization overview:** https://docs.liquid.ai/customization/getting-started/welcome

### Liquid AI Official Cookbook (GitHub)
- **Repository:** https://github.com/Liquid4All/cookbook
- **SFT with TRL notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
  - This is the primary source for our LoRA configuration and training setup
  - Defines target modules for LFM2 architecture: attention + GLU + conv layers
- **SFT with Unsloth notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_unsloth.ipynb
  - Alternative fine-tuning approach using Unsloth for 2-5x faster training
  - Uses 16-bit LoRA with gradient checkpointing

### Other Liquid AI Models (Evaluated, Not Used)
- **LFM2-8B-A1B (MoE):** https://huggingface.co/LiquidAI/LFM2-8B-A1B
  - 8B total params, 1B active (Mixture of Experts)
  - Considered as teacher model but too large for 24 GB Mac (~16 GB for weights alone)
- **LFM2-2.6B:** https://huggingface.co/LiquidAI/LFM2-2.6B
  - Evaluated as larger alternative, would fit (~5.2 GB) but tight with LoRA + optimizer
- **Full model catalog:** https://huggingface.co/LiquidAI

---

## 2. Fine-Tuning Framework

### TRL — SFTTrainer (Supervised Fine-Tuning)
```python
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...)
```
- **What:** HuggingFace library for training language models with reinforcement learning and SFT
- **Docs:** https://huggingface.co/docs/trl
- **Source:** https://github.com/huggingface/trl
- **SFTTrainer guide:** https://huggingface.co/docs/trl/sft_trainer
- **Why we use it:** Liquid AI's officially recommended fine-tuning method
- **Key feature:** Automatically handles chat template application, tokenization, and prompt masking
- **Version note:** TRL v0.29 renamed `max_seq_length` to `max_length` in SFTConfig

### PEFT — LoRA (Low-Rank Adaptation)
```python
from peft import LoraConfig, PeftModel
```
- **What:** Parameter-Efficient Fine-Tuning library — adds small trainable adapters to frozen models
- **Docs:** https://huggingface.co/docs/peft
- **Source:** https://github.com/huggingface/peft
- **LoRA conceptual guide:** https://huggingface.co/docs/peft/conceptual_guides/lora
- **LoRA paper:** Hu, E., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685
- **Why we use it:** Trains only ~1-5% of parameters — makes fine-tuning possible on a laptop

### LoRA Configuration (from Liquid AI Cookbook)
```python
peft_config = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"],
)
```
- **Source:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
- **Target modules explained:**
  - `q_proj, k_proj, v_proj, out_proj` — Multi-Head Attention layers
  - `w1, w2, w3` — GLU (Gated Linear Unit) feed-forward layers
  - `in_proj` — Conv block input projection (unique to Liquid AI architecture)
- **Why these modules:** Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA.
- **Note:** Standard transformer LoRA typically only targets `q_proj` and `v_proj`. The expanded target list is specific to LFM2 models.

---

## 3. PyTorch & Apple Silicon

### PyTorch MPS Backend
```python
import torch
torch.backends.mps.is_available()  # True on Apple Silicon
```
- **What:** Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration
- **Docs:** https://pytorch.org/docs/stable/notes/mps.html
- **Why we use it:** Enables GPU-accelerated training on Mac without NVIDIA hardware
- **Key finding:** MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer)

### HuggingFace Accelerate
```python
# device_map="auto" uses accelerate under the hood
model = AutoModelForCausalLM.from_pretrained(..., device_map="auto")
```
- **What:** Automatic device placement library
- **Docs:** https://huggingface.co/docs/accelerate
- **Why we use it:** Automatically places model on MPS (Mac), CUDA (NVIDIA), or CPU

---

## 4. HuggingFace Transformers

### AutoModelForCausalLM / AutoTokenizer
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
```
- **What:** Auto-classes that load any causal language model from HuggingFace Hub
- **Docs:** https://huggingface.co/docs/transformers
- **Source:** https://github.com/huggingface/transformers
- **Chat templates:** https://huggingface.co/docs/transformers/en/chat_templating
- **Why we use it:** Standard interface for loading and running Liquid AI models

### HuggingFace Datasets
```python
from datasets import Dataset
dataset = Dataset.from_list(examples)
```
- **What:** Library for loading and processing datasets
- **Docs:** https://huggingface.co/docs/datasets
- **Why we use it:** SFTTrainer expects HuggingFace Dataset objects with a "messages" column

---

## 5. Training Data

### Dataset Source
- **Origin:** Generated by the MLX sibling project using Qwen3-VL-32B
- **HuggingFace dataset:** `FaroukMoc2/email_spam-qwen3-vl-32b`
  - Source: https://huggingface.co/datasets/FaroukMoc2/email_spam-qwen3-vl-32b
- **Size:** 3,200 training + 800 test examples
- **Format:** JSONL with chat-style messages (`system`, `user`, `assistant` roles)
- **Why reused:** The JSONL chat format is model-agnostic — works with any model that supports chat templates

### Original Email Dataset
- **Source:** Kaggle spam email dataset (193,852 emails)
- **CSV path:** `data/spam_Emails_data.csv` (symlinked from spam-xai-project)

---

## 6. Gradio Web Interface

### Gradio
```python
import gradio as gr
with gr.Blocks() as demo:
    ...
demo.launch()
```
- **What:** Python library for building ML web interfaces
- **Docs:** https://www.gradio.app/docs
- **Source:** https://github.com/gradio-app/gradio
- **Why we use it:** Quick web UI for email classification — same as MLX version for consistency

---

## 7. Performance Findings (Empirical)

These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory:

| Finding | Details |
|---------|---------|
| MPS batch size sweet spot | Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated. |
| Memory usage | ~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations) |
| Training speed | ~0.34 it/s at batch size 4 on MPS |
| Model load time | 30-60 seconds for initial model loading into memory |
| MLX vs PyTorch MPS | MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer |
| No orphaned ports | Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown |
| TRL v0.29 breaking change | `max_seq_length` renamed to `max_length` in SFTConfig |
| LFM2 layer names | Uses `out_proj` (not `o_proj` like standard transformers) |

---

## 8. Comparison with MLX Version

| Aspect | MLX Version | Liquid AI Version |
|--------|-------------|-------------------|
| Model | Qwen3.5-0.8B (4-bit quantized) | LFM2.5-1.2B-Instruct (bf16) |
| Architecture | Transformer | Liquid Neural Network (state-space + attention + conv) |
| Framework | Apple MLX + mlx-lm | PyTorch + HuggingFace Transformers + TRL + PEFT |
| Fine-tuning tool | mlx-lm LoRA CLI | TRL SFTTrainer + PEFT LoRA |
| Training speed | ~10-20 min | ~37 min (1 epoch), ~2 hrs (3 epochs) |
| Memory usage | ~3-4 GB | ~7-8 GB |
| Platform | Apple Silicon only | Any platform (Mac MPS, NVIDIA CUDA, CPU) |
| Model serving | Spawns llama-server (can leak ports) | In-process PyTorch (clean shutdown) |
| LoRA targets | Attention layers only | Attention + GLU + Conv (8 module types) |
| Training data | Same (model-agnostic JSONL format) | Same (copied from MLX project) |
| Gradio UI | Identical | Identical |

---

## Academic Citations (for Paper)

```
Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021).
  LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.

Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404.

Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks.
  https://github.com/Liquid4All/cookbook

Liquid AI. (2026). LFM2.5-1.2B-Instruct model card.
  https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct

von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning.
  https://github.com/huggingface/trl

Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning.
  https://github.com/huggingface/peft

Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing.
  Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45.
  https://github.com/huggingface/transformers

Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library.
  Advances in Neural Information Processing Systems 32, pp. 8024-8035.
```