Instructions to use VoltageVagabond/spam-classifier-liquid with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use VoltageVagabond/spam-classifier-liquid with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") model = PeftModel.from_pretrained(base_model, "VoltageVagabond/spam-classifier-liquid") - Notebooks
- Google Colab
- Kaggle
| # Code Sources & References | |
| Every code snippet, technique, and configuration used in this project traced back to its original source. | |
| Use this when writing your paper to cite where each technique came from. | |
| --- | |
| ## 1. Liquid AI — Model & Architecture | |
| ### LFM2.5-1.2B-Instruct (Our Model) | |
| ```python | |
| model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") | |
| ``` | |
| - **What:** 1.2 billion parameter instruction-tuned language model | |
| - **HuggingFace:** https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct | |
| - **Company:** https://www.liquid.ai/ | |
| - **Architecture:** Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans) | |
| - **Paper:** arXiv:2511.23404 — LFM2 technical report | |
| - **Why we use it:** Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible | |
| ### Liquid AI Official Documentation | |
| - **Main docs:** https://docs.liquid.ai | |
| - **Transformers inference guide:** https://docs.liquid.ai/deployment/gpu-inference/transformers | |
| - **Fine-tuning with TRL:** https://docs.liquid.ai/customization/finetuning-frameworks/trl | |
| - **Fine-tuning with Unsloth:** https://docs.liquid.ai/customization/finetuning-frameworks/unsloth | |
| - **Dataset formats:** https://docs.liquid.ai/customization/finetuning-frameworks/datasets | |
| - **Customization overview:** https://docs.liquid.ai/customization/getting-started/welcome | |
| ### Liquid AI Official Cookbook (GitHub) | |
| - **Repository:** https://github.com/Liquid4All/cookbook | |
| - **SFT with TRL notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb | |
| - This is the primary source for our LoRA configuration and training setup | |
| - Defines target modules for LFM2 architecture: attention + GLU + conv layers | |
| - **SFT with Unsloth notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_unsloth.ipynb | |
| - Alternative fine-tuning approach using Unsloth for 2-5x faster training | |
| - Uses 16-bit LoRA with gradient checkpointing | |
| ### Other Liquid AI Models (Evaluated, Not Used) | |
| - **LFM2-8B-A1B (MoE):** https://huggingface.co/LiquidAI/LFM2-8B-A1B | |
| - 8B total params, 1B active (Mixture of Experts) | |
| - Considered as teacher model but too large for 24 GB Mac (~16 GB for weights alone) | |
| - **LFM2-2.6B:** https://huggingface.co/LiquidAI/LFM2-2.6B | |
| - Evaluated as larger alternative, would fit (~5.2 GB) but tight with LoRA + optimizer | |
| - **Full model catalog:** https://huggingface.co/LiquidAI | |
| --- | |
| ## 2. Fine-Tuning Framework | |
| ### TRL — SFTTrainer (Supervised Fine-Tuning) | |
| ```python | |
| from trl import SFTConfig, SFTTrainer | |
| trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...) | |
| ``` | |
| - **What:** HuggingFace library for training language models with reinforcement learning and SFT | |
| - **Docs:** https://huggingface.co/docs/trl | |
| - **Source:** https://github.com/huggingface/trl | |
| - **SFTTrainer guide:** https://huggingface.co/docs/trl/sft_trainer | |
| - **Why we use it:** Liquid AI's officially recommended fine-tuning method | |
| - **Key feature:** Automatically handles chat template application, tokenization, and prompt masking | |
| - **Version note:** TRL v0.29 renamed `max_seq_length` to `max_length` in SFTConfig | |
| ### PEFT — LoRA (Low-Rank Adaptation) | |
| ```python | |
| from peft import LoraConfig, PeftModel | |
| ``` | |
| - **What:** Parameter-Efficient Fine-Tuning library — adds small trainable adapters to frozen models | |
| - **Docs:** https://huggingface.co/docs/peft | |
| - **Source:** https://github.com/huggingface/peft | |
| - **LoRA conceptual guide:** https://huggingface.co/docs/peft/conceptual_guides/lora | |
| - **LoRA paper:** Hu, E., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685 | |
| - **Why we use it:** Trains only ~1-5% of parameters — makes fine-tuning possible on a laptop | |
| ### LoRA Configuration (from Liquid AI Cookbook) | |
| ```python | |
| peft_config = LoraConfig( | |
| r=8, lora_alpha=16, lora_dropout=0.1, | |
| target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"], | |
| ) | |
| ``` | |
| - **Source:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb | |
| - **Target modules explained:** | |
| - `q_proj, k_proj, v_proj, out_proj` — Multi-Head Attention layers | |
| - `w1, w2, w3` — GLU (Gated Linear Unit) feed-forward layers | |
| - `in_proj` — Conv block input projection (unique to Liquid AI architecture) | |
| - **Why these modules:** Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA. | |
| - **Note:** Standard transformer LoRA typically only targets `q_proj` and `v_proj`. The expanded target list is specific to LFM2 models. | |
| --- | |
| ## 3. PyTorch & Apple Silicon | |
| ### PyTorch MPS Backend | |
| ```python | |
| import torch | |
| torch.backends.mps.is_available() # True on Apple Silicon | |
| ``` | |
| - **What:** Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration | |
| - **Docs:** https://pytorch.org/docs/stable/notes/mps.html | |
| - **Why we use it:** Enables GPU-accelerated training on Mac without NVIDIA hardware | |
| - **Key finding:** MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer) | |
| ### HuggingFace Accelerate | |
| ```python | |
| # device_map="auto" uses accelerate under the hood | |
| model = AutoModelForCausalLM.from_pretrained(..., device_map="auto") | |
| ``` | |
| - **What:** Automatic device placement library | |
| - **Docs:** https://huggingface.co/docs/accelerate | |
| - **Why we use it:** Automatically places model on MPS (Mac), CUDA (NVIDIA), or CPU | |
| --- | |
| ## 4. HuggingFace Transformers | |
| ### AutoModelForCausalLM / AutoTokenizer | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| ``` | |
| - **What:** Auto-classes that load any causal language model from HuggingFace Hub | |
| - **Docs:** https://huggingface.co/docs/transformers | |
| - **Source:** https://github.com/huggingface/transformers | |
| - **Chat templates:** https://huggingface.co/docs/transformers/en/chat_templating | |
| - **Why we use it:** Standard interface for loading and running Liquid AI models | |
| ### HuggingFace Datasets | |
| ```python | |
| from datasets import Dataset | |
| dataset = Dataset.from_list(examples) | |
| ``` | |
| - **What:** Library for loading and processing datasets | |
| - **Docs:** https://huggingface.co/docs/datasets | |
| - **Why we use it:** SFTTrainer expects HuggingFace Dataset objects with a "messages" column | |
| --- | |
| ## 5. Training Data | |
| ### Dataset Source | |
| - **Origin:** Generated by the MLX sibling project using Qwen3-VL-32B | |
| - **HuggingFace dataset:** `FaroukMoc2/email_spam-qwen3-vl-32b` | |
| - Source: https://huggingface.co/datasets/FaroukMoc2/email_spam-qwen3-vl-32b | |
| - **Size:** 3,200 training + 800 test examples | |
| - **Format:** JSONL with chat-style messages (`system`, `user`, `assistant` roles) | |
| - **Why reused:** The JSONL chat format is model-agnostic — works with any model that supports chat templates | |
| ### Original Email Dataset | |
| - **Source:** Kaggle spam email dataset (193,852 emails) | |
| - **CSV path:** `data/spam_Emails_data.csv` (symlinked from spam-xai-project) | |
| --- | |
| ## 6. Gradio Web Interface | |
| ### Gradio | |
| ```python | |
| import gradio as gr | |
| with gr.Blocks() as demo: | |
| ... | |
| demo.launch() | |
| ``` | |
| - **What:** Python library for building ML web interfaces | |
| - **Docs:** https://www.gradio.app/docs | |
| - **Source:** https://github.com/gradio-app/gradio | |
| - **Why we use it:** Quick web UI for email classification — same as MLX version for consistency | |
| --- | |
| ## 7. Performance Findings (Empirical) | |
| These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory: | |
| | Finding | Details | | |
| |---------|---------| | |
| | MPS batch size sweet spot | Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated. | | |
| | Memory usage | ~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations) | | |
| | Training speed | ~0.34 it/s at batch size 4 on MPS | | |
| | Model load time | 30-60 seconds for initial model loading into memory | | |
| | MLX vs PyTorch MPS | MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer | | |
| | No orphaned ports | Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown | | |
| | TRL v0.29 breaking change | `max_seq_length` renamed to `max_length` in SFTConfig | | |
| | LFM2 layer names | Uses `out_proj` (not `o_proj` like standard transformers) | | |
| --- | |
| ## 8. Comparison with MLX Version | |
| | Aspect | MLX Version | Liquid AI Version | | |
| |--------|-------------|-------------------| | |
| | Model | Qwen3.5-0.8B (4-bit quantized) | LFM2.5-1.2B-Instruct (bf16) | | |
| | Architecture | Transformer | Liquid Neural Network (state-space + attention + conv) | | |
| | Framework | Apple MLX + mlx-lm | PyTorch + HuggingFace Transformers + TRL + PEFT | | |
| | Fine-tuning tool | mlx-lm LoRA CLI | TRL SFTTrainer + PEFT LoRA | | |
| | Training speed | ~10-20 min | ~37 min (1 epoch), ~2 hrs (3 epochs) | | |
| | Memory usage | ~3-4 GB | ~7-8 GB | | |
| | Platform | Apple Silicon only | Any platform (Mac MPS, NVIDIA CUDA, CPU) | | |
| | Model serving | Spawns llama-server (can leak ports) | In-process PyTorch (clean shutdown) | | |
| | LoRA targets | Attention layers only | Attention + GLU + Conv (8 module types) | | |
| | Training data | Same (model-agnostic JSONL format) | Same (copied from MLX project) | | |
| | Gradio UI | Identical | Identical | | |
| --- | |
| ## Academic Citations (for Paper) | |
| ``` | |
| Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). | |
| LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. | |
| Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404. | |
| Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks. | |
| https://github.com/Liquid4All/cookbook | |
| Liquid AI. (2026). LFM2.5-1.2B-Instruct model card. | |
| https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct | |
| von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning. | |
| https://github.com/huggingface/trl | |
| Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning. | |
| https://github.com/huggingface/peft | |
| Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing. | |
| Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45. | |
| https://github.com/huggingface/transformers | |
| Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. | |
| Advances in Neural Information Processing Systems 32, pp. 8024-8035. | |
| ``` | |