Instructions to use VoltageVagabond/spam-classifier-liquid with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use VoltageVagabond/spam-classifier-liquid with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") model = PeftModel.from_pretrained(base_model, "VoltageVagabond/spam-classifier-liquid") - Notebooks
- Google Colab
- Kaggle
File size: 10,446 Bytes
92c0ea5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | # Code Sources & References
Every code snippet, technique, and configuration used in this project traced back to its original source.
Use this when writing your paper to cite where each technique came from.
---
## 1. Liquid AI — Model & Architecture
### LFM2.5-1.2B-Instruct (Our Model)
```python
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")
```
- **What:** 1.2 billion parameter instruction-tuned language model
- **HuggingFace:** https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
- **Company:** https://www.liquid.ai/
- **Architecture:** Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans)
- **Paper:** arXiv:2511.23404 — LFM2 technical report
- **Why we use it:** Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible
### Liquid AI Official Documentation
- **Main docs:** https://docs.liquid.ai
- **Transformers inference guide:** https://docs.liquid.ai/deployment/gpu-inference/transformers
- **Fine-tuning with TRL:** https://docs.liquid.ai/customization/finetuning-frameworks/trl
- **Fine-tuning with Unsloth:** https://docs.liquid.ai/customization/finetuning-frameworks/unsloth
- **Dataset formats:** https://docs.liquid.ai/customization/finetuning-frameworks/datasets
- **Customization overview:** https://docs.liquid.ai/customization/getting-started/welcome
### Liquid AI Official Cookbook (GitHub)
- **Repository:** https://github.com/Liquid4All/cookbook
- **SFT with TRL notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
- This is the primary source for our LoRA configuration and training setup
- Defines target modules for LFM2 architecture: attention + GLU + conv layers
- **SFT with Unsloth notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_unsloth.ipynb
- Alternative fine-tuning approach using Unsloth for 2-5x faster training
- Uses 16-bit LoRA with gradient checkpointing
### Other Liquid AI Models (Evaluated, Not Used)
- **LFM2-8B-A1B (MoE):** https://huggingface.co/LiquidAI/LFM2-8B-A1B
- 8B total params, 1B active (Mixture of Experts)
- Considered as teacher model but too large for 24 GB Mac (~16 GB for weights alone)
- **LFM2-2.6B:** https://huggingface.co/LiquidAI/LFM2-2.6B
- Evaluated as larger alternative, would fit (~5.2 GB) but tight with LoRA + optimizer
- **Full model catalog:** https://huggingface.co/LiquidAI
---
## 2. Fine-Tuning Framework
### TRL — SFTTrainer (Supervised Fine-Tuning)
```python
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...)
```
- **What:** HuggingFace library for training language models with reinforcement learning and SFT
- **Docs:** https://huggingface.co/docs/trl
- **Source:** https://github.com/huggingface/trl
- **SFTTrainer guide:** https://huggingface.co/docs/trl/sft_trainer
- **Why we use it:** Liquid AI's officially recommended fine-tuning method
- **Key feature:** Automatically handles chat template application, tokenization, and prompt masking
- **Version note:** TRL v0.29 renamed `max_seq_length` to `max_length` in SFTConfig
### PEFT — LoRA (Low-Rank Adaptation)
```python
from peft import LoraConfig, PeftModel
```
- **What:** Parameter-Efficient Fine-Tuning library — adds small trainable adapters to frozen models
- **Docs:** https://huggingface.co/docs/peft
- **Source:** https://github.com/huggingface/peft
- **LoRA conceptual guide:** https://huggingface.co/docs/peft/conceptual_guides/lora
- **LoRA paper:** Hu, E., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685
- **Why we use it:** Trains only ~1-5% of parameters — makes fine-tuning possible on a laptop
### LoRA Configuration (from Liquid AI Cookbook)
```python
peft_config = LoraConfig(
r=8, lora_alpha=16, lora_dropout=0.1,
target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"],
)
```
- **Source:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
- **Target modules explained:**
- `q_proj, k_proj, v_proj, out_proj` — Multi-Head Attention layers
- `w1, w2, w3` — GLU (Gated Linear Unit) feed-forward layers
- `in_proj` — Conv block input projection (unique to Liquid AI architecture)
- **Why these modules:** Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA.
- **Note:** Standard transformer LoRA typically only targets `q_proj` and `v_proj`. The expanded target list is specific to LFM2 models.
---
## 3. PyTorch & Apple Silicon
### PyTorch MPS Backend
```python
import torch
torch.backends.mps.is_available() # True on Apple Silicon
```
- **What:** Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration
- **Docs:** https://pytorch.org/docs/stable/notes/mps.html
- **Why we use it:** Enables GPU-accelerated training on Mac without NVIDIA hardware
- **Key finding:** MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer)
### HuggingFace Accelerate
```python
# device_map="auto" uses accelerate under the hood
model = AutoModelForCausalLM.from_pretrained(..., device_map="auto")
```
- **What:** Automatic device placement library
- **Docs:** https://huggingface.co/docs/accelerate
- **Why we use it:** Automatically places model on MPS (Mac), CUDA (NVIDIA), or CPU
---
## 4. HuggingFace Transformers
### AutoModelForCausalLM / AutoTokenizer
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
```
- **What:** Auto-classes that load any causal language model from HuggingFace Hub
- **Docs:** https://huggingface.co/docs/transformers
- **Source:** https://github.com/huggingface/transformers
- **Chat templates:** https://huggingface.co/docs/transformers/en/chat_templating
- **Why we use it:** Standard interface for loading and running Liquid AI models
### HuggingFace Datasets
```python
from datasets import Dataset
dataset = Dataset.from_list(examples)
```
- **What:** Library for loading and processing datasets
- **Docs:** https://huggingface.co/docs/datasets
- **Why we use it:** SFTTrainer expects HuggingFace Dataset objects with a "messages" column
---
## 5. Training Data
### Dataset Source
- **Origin:** Generated by the MLX sibling project using Qwen3-VL-32B
- **HuggingFace dataset:** `FaroukMoc2/email_spam-qwen3-vl-32b`
- Source: https://huggingface.co/datasets/FaroukMoc2/email_spam-qwen3-vl-32b
- **Size:** 3,200 training + 800 test examples
- **Format:** JSONL with chat-style messages (`system`, `user`, `assistant` roles)
- **Why reused:** The JSONL chat format is model-agnostic — works with any model that supports chat templates
### Original Email Dataset
- **Source:** Kaggle spam email dataset (193,852 emails)
- **CSV path:** `data/spam_Emails_data.csv` (symlinked from spam-xai-project)
---
## 6. Gradio Web Interface
### Gradio
```python
import gradio as gr
with gr.Blocks() as demo:
...
demo.launch()
```
- **What:** Python library for building ML web interfaces
- **Docs:** https://www.gradio.app/docs
- **Source:** https://github.com/gradio-app/gradio
- **Why we use it:** Quick web UI for email classification — same as MLX version for consistency
---
## 7. Performance Findings (Empirical)
These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory:
| Finding | Details |
|---------|---------|
| MPS batch size sweet spot | Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated. |
| Memory usage | ~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations) |
| Training speed | ~0.34 it/s at batch size 4 on MPS |
| Model load time | 30-60 seconds for initial model loading into memory |
| MLX vs PyTorch MPS | MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer |
| No orphaned ports | Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown |
| TRL v0.29 breaking change | `max_seq_length` renamed to `max_length` in SFTConfig |
| LFM2 layer names | Uses `out_proj` (not `o_proj` like standard transformers) |
---
## 8. Comparison with MLX Version
| Aspect | MLX Version | Liquid AI Version |
|--------|-------------|-------------------|
| Model | Qwen3.5-0.8B (4-bit quantized) | LFM2.5-1.2B-Instruct (bf16) |
| Architecture | Transformer | Liquid Neural Network (state-space + attention + conv) |
| Framework | Apple MLX + mlx-lm | PyTorch + HuggingFace Transformers + TRL + PEFT |
| Fine-tuning tool | mlx-lm LoRA CLI | TRL SFTTrainer + PEFT LoRA |
| Training speed | ~10-20 min | ~37 min (1 epoch), ~2 hrs (3 epochs) |
| Memory usage | ~3-4 GB | ~7-8 GB |
| Platform | Apple Silicon only | Any platform (Mac MPS, NVIDIA CUDA, CPU) |
| Model serving | Spawns llama-server (can leak ports) | In-process PyTorch (clean shutdown) |
| LoRA targets | Attention layers only | Attention + GLU + Conv (8 module types) |
| Training data | Same (model-agnostic JSONL format) | Same (copied from MLX project) |
| Gradio UI | Identical | Identical |
---
## Academic Citations (for Paper)
```
Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021).
LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.
Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404.
Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks.
https://github.com/Liquid4All/cookbook
Liquid AI. (2026). LFM2.5-1.2B-Instruct model card.
https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning.
https://github.com/huggingface/trl
Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning.
https://github.com/huggingface/peft
Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing.
Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45.
https://github.com/huggingface/transformers
Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library.
Advances in Neural Information Processing Systems 32, pp. 8024-8035.
```
|