spam-classifier-liquid / docs /07-code-sources-reference.md
VoltageVagabond's picture
Upload folder using huggingface_hub
92c0ea5 verified
|
raw
history blame
10.4 kB

Code Sources & References

Every code snippet, technique, and configuration used in this project traced back to its original source. Use this when writing your paper to cite where each technique came from.


1. Liquid AI — Model & Architecture

LFM2.5-1.2B-Instruct (Our Model)

model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")
  • What: 1.2 billion parameter instruction-tuned language model
  • HuggingFace: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
  • Company: https://www.liquid.ai/
  • Architecture: Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans)
  • Paper: arXiv:2511.23404 — LFM2 technical report
  • Why we use it: Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible

Liquid AI Official Documentation

Liquid AI Official Cookbook (GitHub)

Other Liquid AI Models (Evaluated, Not Used)


2. Fine-Tuning Framework

TRL — SFTTrainer (Supervised Fine-Tuning)

from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...)

PEFT — LoRA (Low-Rank Adaptation)

from peft import LoraConfig, PeftModel

LoRA Configuration (from Liquid AI Cookbook)

peft_config = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"],
)
  • Source: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
  • Target modules explained:
    • q_proj, k_proj, v_proj, out_proj — Multi-Head Attention layers
    • w1, w2, w3 — GLU (Gated Linear Unit) feed-forward layers
    • in_proj — Conv block input projection (unique to Liquid AI architecture)
  • Why these modules: Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA.
  • Note: Standard transformer LoRA typically only targets q_proj and v_proj. The expanded target list is specific to LFM2 models.

3. PyTorch & Apple Silicon

PyTorch MPS Backend

import torch
torch.backends.mps.is_available()  # True on Apple Silicon
  • What: Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration
  • Docs: https://pytorch.org/docs/stable/notes/mps.html
  • Why we use it: Enables GPU-accelerated training on Mac without NVIDIA hardware
  • Key finding: MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer)

HuggingFace Accelerate

# device_map="auto" uses accelerate under the hood
model = AutoModelForCausalLM.from_pretrained(..., device_map="auto")

4. HuggingFace Transformers

AutoModelForCausalLM / AutoTokenizer

from transformers import AutoModelForCausalLM, AutoTokenizer

HuggingFace Datasets

from datasets import Dataset
dataset = Dataset.from_list(examples)
  • What: Library for loading and processing datasets
  • Docs: https://huggingface.co/docs/datasets
  • Why we use it: SFTTrainer expects HuggingFace Dataset objects with a "messages" column

5. Training Data

Dataset Source

  • Origin: Generated by the MLX sibling project using Qwen3-VL-32B
  • HuggingFace dataset: FaroukMoc2/email_spam-qwen3-vl-32b
  • Size: 3,200 training + 800 test examples
  • Format: JSONL with chat-style messages (system, user, assistant roles)
  • Why reused: The JSONL chat format is model-agnostic — works with any model that supports chat templates

Original Email Dataset

  • Source: Kaggle spam email dataset (193,852 emails)
  • CSV path: data/spam_Emails_data.csv (symlinked from spam-xai-project)

6. Gradio Web Interface

Gradio

import gradio as gr
with gr.Blocks() as demo:
    ...
demo.launch()

7. Performance Findings (Empirical)

These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory:

Finding Details
MPS batch size sweet spot Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated.
Memory usage ~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations)
Training speed ~0.34 it/s at batch size 4 on MPS
Model load time 30-60 seconds for initial model loading into memory
MLX vs PyTorch MPS MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer
No orphaned ports Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown
TRL v0.29 breaking change max_seq_length renamed to max_length in SFTConfig
LFM2 layer names Uses out_proj (not o_proj like standard transformers)

8. Comparison with MLX Version

Aspect MLX Version Liquid AI Version
Model Qwen3.5-0.8B (4-bit quantized) LFM2.5-1.2B-Instruct (bf16)
Architecture Transformer Liquid Neural Network (state-space + attention + conv)
Framework Apple MLX + mlx-lm PyTorch + HuggingFace Transformers + TRL + PEFT
Fine-tuning tool mlx-lm LoRA CLI TRL SFTTrainer + PEFT LoRA
Training speed ~10-20 min ~37 min (1 epoch), ~2 hrs (3 epochs)
Memory usage ~3-4 GB ~7-8 GB
Platform Apple Silicon only Any platform (Mac MPS, NVIDIA CUDA, CPU)
Model serving Spawns llama-server (can leak ports) In-process PyTorch (clean shutdown)
LoRA targets Attention layers only Attention + GLU + Conv (8 module types)
Training data Same (model-agnostic JSONL format) Same (copied from MLX project)
Gradio UI Identical Identical

Academic Citations (for Paper)

Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021).
  LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.

Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404.

Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks.
  https://github.com/Liquid4All/cookbook

Liquid AI. (2026). LFM2.5-1.2B-Instruct model card.
  https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct

von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning.
  https://github.com/huggingface/trl

Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning.
  https://github.com/huggingface/peft

Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing.
  Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45.
  https://github.com/huggingface/transformers

Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library.
  Advances in Neural Information Processing Systems 32, pp. 8024-8035.