spam-classifier-liquid / docs /07-code-sources-reference.md

Upload folder using huggingface_hub

92c0ea5 verified about 2 months ago

10.4 kB

	# Code Sources & References

	Every code snippet, technique, and configuration used in this project traced back to its original source.
	Use this when writing your paper to cite where each technique came from.

	---

	## 1. Liquid AI — Model & Architecture

	### LFM2.5-1.2B-Instruct (Our Model)
	```python
	model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")
	```
	- What: 1.2 billion parameter instruction-tuned language model
	- HuggingFace: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
	- Company: https://www.liquid.ai/
	- Architecture: Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans)
	- Paper: arXiv:2511.23404 — LFM2 technical report
	- Why we use it: Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible

	### Liquid AI Official Documentation
	- Main docs: https://docs.liquid.ai
	- Transformers inference guide: https://docs.liquid.ai/deployment/gpu-inference/transformers
	- Fine-tuning with TRL: https://docs.liquid.ai/customization/finetuning-frameworks/trl
	- Fine-tuning with Unsloth: https://docs.liquid.ai/customization/finetuning-frameworks/unsloth
	- Dataset formats: https://docs.liquid.ai/customization/finetuning-frameworks/datasets
	- Customization overview: https://docs.liquid.ai/customization/getting-started/welcome

	### Liquid AI Official Cookbook (GitHub)
	- Repository: https://github.com/Liquid4All/cookbook
	- SFT with TRL notebook: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
	- This is the primary source for our LoRA configuration and training setup
	- Defines target modules for LFM2 architecture: attention + GLU + conv layers
	- SFT with Unsloth notebook: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_unsloth.ipynb
	- Alternative fine-tuning approach using Unsloth for 2-5x faster training
	- Uses 16-bit LoRA with gradient checkpointing

	### Other Liquid AI Models (Evaluated, Not Used)
	- LFM2-8B-A1B (MoE): https://huggingface.co/LiquidAI/LFM2-8B-A1B
	- 8B total params, 1B active (Mixture of Experts)
	- Considered as teacher model but too large for 24 GB Mac (~16 GB for weights alone)
	- LFM2-2.6B: https://huggingface.co/LiquidAI/LFM2-2.6B
	- Evaluated as larger alternative, would fit (~5.2 GB) but tight with LoRA + optimizer
	- Full model catalog: https://huggingface.co/LiquidAI

	---

	## 2. Fine-Tuning Framework

	### TRL — SFTTrainer (Supervised Fine-Tuning)
	```python
	from trl import SFTConfig, SFTTrainer
	trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...)
	```
	- What: HuggingFace library for training language models with reinforcement learning and SFT
	- Docs: https://huggingface.co/docs/trl
	- Source: https://github.com/huggingface/trl
	- SFTTrainer guide: https://huggingface.co/docs/trl/sft_trainer
	- Why we use it: Liquid AI's officially recommended fine-tuning method
	- Key feature: Automatically handles chat template application, tokenization, and prompt masking
	- Version note: TRL v0.29 renamed `max_seq_length` to `max_length` in SFTConfig

	### PEFT — LoRA (Low-Rank Adaptation)
	```python
	from peft import LoraConfig, PeftModel
	```
	- What: Parameter-Efficient Fine-Tuning library — adds small trainable adapters to frozen models
	- Docs: https://huggingface.co/docs/peft
	- Source: https://github.com/huggingface/peft
	- LoRA conceptual guide: https://huggingface.co/docs/peft/conceptual_guides/lora
	- LoRA paper: Hu, E., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685
	- Why we use it: Trains only ~1-5% of parameters — makes fine-tuning possible on a laptop

	### LoRA Configuration (from Liquid AI Cookbook)
	```python
	peft_config = LoraConfig(
	r=8, lora_alpha=16, lora_dropout=0.1,
	target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"],
	)
	```
	- Source: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
	- Target modules explained:
	- `q_proj, k_proj, v_proj, out_proj` — Multi-Head Attention layers
	- `w1, w2, w3` — GLU (Gated Linear Unit) feed-forward layers
	- `in_proj` — Conv block input projection (unique to Liquid AI architecture)
	- Why these modules: Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA.
	- Note: Standard transformer LoRA typically only targets `q_proj` and `v_proj`. The expanded target list is specific to LFM2 models.

	---

	## 3. PyTorch & Apple Silicon

	### PyTorch MPS Backend
	```python
	import torch
	torch.backends.mps.is_available() # True on Apple Silicon
	```
	- What: Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration
	- Docs: https://pytorch.org/docs/stable/notes/mps.html
	- Why we use it: Enables GPU-accelerated training on Mac without NVIDIA hardware
	- Key finding: MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer)

	### HuggingFace Accelerate
	```python
	# device_map="auto" uses accelerate under the hood
	model = AutoModelForCausalLM.from_pretrained(..., device_map="auto")
	```
	- What: Automatic device placement library
	- Docs: https://huggingface.co/docs/accelerate
	- Why we use it: Automatically places model on MPS (Mac), CUDA (NVIDIA), or CPU

	---

	## 4. HuggingFace Transformers

	### AutoModelForCausalLM / AutoTokenizer
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	```
	- What: Auto-classes that load any causal language model from HuggingFace Hub
	- Docs: https://huggingface.co/docs/transformers
	- Source: https://github.com/huggingface/transformers
	- Chat templates: https://huggingface.co/docs/transformers/en/chat_templating
	- Why we use it: Standard interface for loading and running Liquid AI models

	### HuggingFace Datasets
	```python
	from datasets import Dataset
	dataset = Dataset.from_list(examples)
	```
	- What: Library for loading and processing datasets
	- Docs: https://huggingface.co/docs/datasets
	- Why we use it: SFTTrainer expects HuggingFace Dataset objects with a "messages" column

	---

	## 5. Training Data

	### Dataset Source
	- Origin: Generated by the MLX sibling project using Qwen3-VL-32B
	- HuggingFace dataset: `FaroukMoc2/email_spam-qwen3-vl-32b`
	- Source: https://huggingface.co/datasets/FaroukMoc2/email_spam-qwen3-vl-32b
	- Size: 3,200 training + 800 test examples
	- Format: JSONL with chat-style messages (`system`, `user`, `assistant` roles)
	- Why reused: The JSONL chat format is model-agnostic — works with any model that supports chat templates

	### Original Email Dataset
	- Source: Kaggle spam email dataset (193,852 emails)
	- CSV path: `data/spam_Emails_data.csv` (symlinked from spam-xai-project)

	---

	## 6. Gradio Web Interface

	### Gradio
	```python
	import gradio as gr
	with gr.Blocks() as demo:
	...
	demo.launch()
	```
	- What: Python library for building ML web interfaces
	- Docs: https://www.gradio.app/docs
	- Source: https://github.com/gradio-app/gradio
	- Why we use it: Quick web UI for email classification — same as MLX version for consistency

	---

	## 7. Performance Findings (Empirical)

	These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory:

	\| Finding \| Details \|
	\|---------\|---------\|
	\| MPS batch size sweet spot \| Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated. \|
	\| Memory usage \| ~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations) \|
	\| Training speed \| ~0.34 it/s at batch size 4 on MPS \|
	\| Model load time \| 30-60 seconds for initial model loading into memory \|
	\| MLX vs PyTorch MPS \| MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer \|
	\| No orphaned ports \| Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown \|
	\| TRL v0.29 breaking change \| `max_seq_length` renamed to `max_length` in SFTConfig \|
	\| LFM2 layer names \| Uses `out_proj` (not `o_proj` like standard transformers) \|

	---

	## 8. Comparison with MLX Version

	\| Aspect \| MLX Version \| Liquid AI Version \|
	\|--------\|-------------\|-------------------\|
	\| Model \| Qwen3.5-0.8B (4-bit quantized) \| LFM2.5-1.2B-Instruct (bf16) \|
	\| Architecture \| Transformer \| Liquid Neural Network (state-space + attention + conv) \|
	\| Framework \| Apple MLX + mlx-lm \| PyTorch + HuggingFace Transformers + TRL + PEFT \|
	\| Fine-tuning tool \| mlx-lm LoRA CLI \| TRL SFTTrainer + PEFT LoRA \|
	\| Training speed \| ~10-20 min \| ~37 min (1 epoch), ~2 hrs (3 epochs) \|
	\| Memory usage \| ~3-4 GB \| ~7-8 GB \|
	\| Platform \| Apple Silicon only \| Any platform (Mac MPS, NVIDIA CUDA, CPU) \|
	\| Model serving \| Spawns llama-server (can leak ports) \| In-process PyTorch (clean shutdown) \|
	\| LoRA targets \| Attention layers only \| Attention + GLU + Conv (8 module types) \|
	\| Training data \| Same (model-agnostic JSONL format) \| Same (copied from MLX project) \|
	\| Gradio UI \| Identical \| Identical \|

	---

	## Academic Citations (for Paper)

	```
	Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021).
	LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.

	Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404.

	Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks.
	https://github.com/Liquid4All/cookbook

	Liquid AI. (2026). LFM2.5-1.2B-Instruct model card.
	https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct

	von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning.
	https://github.com/huggingface/trl

	Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning.
	https://github.com/huggingface/peft

	Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing.
	Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45.
	https://github.com/huggingface/transformers

	Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library.
	Advances in Neural Information Processing Systems 32, pp. 8024-8035.
	```