spam-classifier-liquid / docs /07-code-sources-reference.md

VoltageVagabond

Upload folder using huggingface_hub

92c0ea5 verified about 2 months ago

preview code

raw

history blame

10.4 kB

Code Sources & References

Every code snippet, technique, and configuration used in this project traced back to its original source. Use this when writing your paper to cite where each technique came from.

1. Liquid AI — Model & Architecture

LFM2.5-1.2B-Instruct (Our Model)

model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")

What: 1.2 billion parameter instruction-tuned language model
HuggingFace: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
Company: https://www.liquid.ai/
Architecture: Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans)
Paper: arXiv:2511.23404 — LFM2 technical report
Why we use it: Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible

Liquid AI Official Documentation

Main docs: https://docs.liquid.ai
Transformers inference guide: https://docs.liquid.ai/deployment/gpu-inference/transformers
Fine-tuning with TRL: https://docs.liquid.ai/customization/finetuning-frameworks/trl
Fine-tuning with Unsloth: https://docs.liquid.ai/customization/finetuning-frameworks/unsloth
Dataset formats: https://docs.liquid.ai/customization/finetuning-frameworks/datasets
Customization overview: https://docs.liquid.ai/customization/getting-started/welcome

Liquid AI Official Cookbook (GitHub)

Repository: https://github.com/Liquid4All/cookbook
SFT with TRL notebook: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
- This is the primary source for our LoRA configuration and training setup
- Defines target modules for LFM2 architecture: attention + GLU + conv layers
SFT with Unsloth notebook: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_unsloth.ipynb
- Alternative fine-tuning approach using Unsloth for 2-5x faster training
- Uses 16-bit LoRA with gradient checkpointing

Other Liquid AI Models (Evaluated, Not Used)

LFM2-8B-A1B (MoE): https://huggingface.co/LiquidAI/LFM2-8B-A1B
- 8B total params, 1B active (Mixture of Experts)
- Considered as teacher model but too large for 24 GB Mac (~16 GB for weights alone)
LFM2-2.6B: https://huggingface.co/LiquidAI/LFM2-2.6B
- Evaluated as larger alternative, would fit (~5.2 GB) but tight with LoRA + optimizer
Full model catalog: https://huggingface.co/LiquidAI

2. Fine-Tuning Framework

TRL — SFTTrainer (Supervised Fine-Tuning)

from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...)

What: HuggingFace library for training language models with reinforcement learning and SFT
Docs: https://huggingface.co/docs/trl
Source: https://github.com/huggingface/trl
SFTTrainer guide: https://huggingface.co/docs/trl/sft_trainer
Why we use it: Liquid AI's officially recommended fine-tuning method
Key feature: Automatically handles chat template application, tokenization, and prompt masking
Version note: TRL v0.29 renamed max_seq_length to max_length in SFTConfig

PEFT — LoRA (Low-Rank Adaptation)

from peft import LoraConfig, PeftModel

What: Parameter-Efficient Fine-Tuning library — adds small trainable adapters to frozen models
Docs: https://huggingface.co/docs/peft
Source: https://github.com/huggingface/peft
LoRA conceptual guide: https://huggingface.co/docs/peft/conceptual_guides/lora
LoRA paper: Hu, E., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685
Why we use it: Trains only ~1-5% of parameters — makes fine-tuning possible on a laptop

LoRA Configuration (from Liquid AI Cookbook)

peft_config = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"],
)

Source: https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb
Target modules explained:
- q_proj, k_proj, v_proj, out_proj — Multi-Head Attention layers
- w1, w2, w3 — GLU (Gated Linear Unit) feed-forward layers
- in_proj — Conv block input projection (unique to Liquid AI architecture)
Why these modules: Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA.
Note: Standard transformer LoRA typically only targets q_proj and v_proj. The expanded target list is specific to LFM2 models.

3. PyTorch & Apple Silicon

PyTorch MPS Backend

import torch
torch.backends.mps.is_available()  # True on Apple Silicon

What: Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration
Docs: https://pytorch.org/docs/stable/notes/mps.html
Why we use it: Enables GPU-accelerated training on Mac without NVIDIA hardware
Key finding: MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer)

HuggingFace Accelerate

# device_map="auto" uses accelerate under the hood
model = AutoModelForCausalLM.from_pretrained(..., device_map="auto")

What: Automatic device placement library
Docs: https://huggingface.co/docs/accelerate
Why we use it: Automatically places model on MPS (Mac), CUDA (NVIDIA), or CPU

4. HuggingFace Transformers

AutoModelForCausalLM / AutoTokenizer

from transformers import AutoModelForCausalLM, AutoTokenizer

What: Auto-classes that load any causal language model from HuggingFace Hub
Docs: https://huggingface.co/docs/transformers
Source: https://github.com/huggingface/transformers
Chat templates: https://huggingface.co/docs/transformers/en/chat_templating
Why we use it: Standard interface for loading and running Liquid AI models

HuggingFace Datasets

from datasets import Dataset
dataset = Dataset.from_list(examples)

What: Library for loading and processing datasets
Docs: https://huggingface.co/docs/datasets
Why we use it: SFTTrainer expects HuggingFace Dataset objects with a "messages" column

5. Training Data

Dataset Source

Origin: Generated by the MLX sibling project using Qwen3-VL-32B
HuggingFace dataset: FaroukMoc2/email_spam-qwen3-vl-32b
- Source: https://huggingface.co/datasets/FaroukMoc2/email_spam-qwen3-vl-32b
Size: 3,200 training + 800 test examples
Format: JSONL with chat-style messages (system, user, assistant roles)
Why reused: The JSONL chat format is model-agnostic — works with any model that supports chat templates

Original Email Dataset

Source: Kaggle spam email dataset (193,852 emails)
CSV path: data/spam_Emails_data.csv (symlinked from spam-xai-project)

6. Gradio Web Interface

Gradio

import gradio as gr
with gr.Blocks() as demo:
    ...
demo.launch()

What: Python library for building ML web interfaces
Docs: https://www.gradio.app/docs
Source: https://github.com/gradio-app/gradio
Why we use it: Quick web UI for email classification — same as MLX version for consistency

7. Performance Findings (Empirical)

These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory:

Finding	Details
MPS batch size sweet spot	Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated.
Memory usage	~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations)
Training speed	~0.34 it/s at batch size 4 on MPS
Model load time	30-60 seconds for initial model loading into memory
MLX vs PyTorch MPS	MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer
No orphaned ports	Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown
TRL v0.29 breaking change	`max_seq_length` renamed to `max_length` in SFTConfig
LFM2 layer names	Uses `out_proj` (not `o_proj` like standard transformers)

8. Comparison with MLX Version

Aspect	MLX Version	Liquid AI Version
Model	Qwen3.5-0.8B (4-bit quantized)	LFM2.5-1.2B-Instruct (bf16)
Architecture	Transformer	Liquid Neural Network (state-space + attention + conv)
Framework	Apple MLX + mlx-lm	PyTorch + HuggingFace Transformers + TRL + PEFT
Fine-tuning tool	mlx-lm LoRA CLI	TRL SFTTrainer + PEFT LoRA
Training speed	~10-20 min	~37 min (1 epoch), ~2 hrs (3 epochs)
Memory usage	~3-4 GB	~7-8 GB
Platform	Apple Silicon only	Any platform (Mac MPS, NVIDIA CUDA, CPU)
Model serving	Spawns llama-server (can leak ports)	In-process PyTorch (clean shutdown)
LoRA targets	Attention layers only	Attention + GLU + Conv (8 module types)
Training data	Same (model-agnostic JSONL format)	Same (copied from MLX project)
Gradio UI	Identical	Identical

Academic Citations (for Paper)

Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021).
  LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.

Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404.

Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks.
  https://github.com/Liquid4All/cookbook

Liquid AI. (2026). LFM2.5-1.2B-Instruct model card.
  https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct

von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning.
  https://github.com/huggingface/trl

Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning.
  https://github.com/huggingface/peft

Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing.
  Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45.
  https://github.com/huggingface/transformers

Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library.
  Advances in Neural Information Processing Systems 32, pp. 8024-8035.