# Code Sources & References Every code snippet, technique, and configuration used in this project traced back to its original source. Use this when writing your paper to cite where each technique came from. --- ## 1. Liquid AI — Model & Architecture ### LFM2.5-1.2B-Instruct (Our Model) ```python model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") ``` - **What:** 1.2 billion parameter instruction-tuned language model - **HuggingFace:** https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct - **Company:** https://www.liquid.ai/ - **Architecture:** Liquid Neural Network — hybrid state-space + attention + conv, inspired by biological neural circuits (C. elegans) - **Paper:** arXiv:2511.23404 — LFM2 technical report - **Why we use it:** Small enough for a laptop (2.4 GB in bf16), instruction-tuned, HuggingFace compatible ### Liquid AI Official Documentation - **Main docs:** https://docs.liquid.ai - **Transformers inference guide:** https://docs.liquid.ai/deployment/gpu-inference/transformers - **Fine-tuning with TRL:** https://docs.liquid.ai/customization/finetuning-frameworks/trl - **Fine-tuning with Unsloth:** https://docs.liquid.ai/customization/finetuning-frameworks/unsloth - **Dataset formats:** https://docs.liquid.ai/customization/finetuning-frameworks/datasets - **Customization overview:** https://docs.liquid.ai/customization/getting-started/welcome ### Liquid AI Official Cookbook (GitHub) - **Repository:** https://github.com/Liquid4All/cookbook - **SFT with TRL notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb - This is the primary source for our LoRA configuration and training setup - Defines target modules for LFM2 architecture: attention + GLU + conv layers - **SFT with Unsloth notebook:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_unsloth.ipynb - Alternative fine-tuning approach using Unsloth for 2-5x faster training - Uses 16-bit LoRA with gradient checkpointing ### Other Liquid AI Models (Evaluated, Not Used) - **LFM2-8B-A1B (MoE):** https://huggingface.co/LiquidAI/LFM2-8B-A1B - 8B total params, 1B active (Mixture of Experts) - Considered as teacher model but too large for 24 GB Mac (~16 GB for weights alone) - **LFM2-2.6B:** https://huggingface.co/LiquidAI/LFM2-2.6B - Evaluated as larger alternative, would fit (~5.2 GB) but tight with LoRA + optimizer - **Full model catalog:** https://huggingface.co/LiquidAI --- ## 2. Fine-Tuning Framework ### TRL — SFTTrainer (Supervised Fine-Tuning) ```python from trl import SFTConfig, SFTTrainer trainer = SFTTrainer(model=model, args=training_args, peft_config=peft_config, ...) ``` - **What:** HuggingFace library for training language models with reinforcement learning and SFT - **Docs:** https://huggingface.co/docs/trl - **Source:** https://github.com/huggingface/trl - **SFTTrainer guide:** https://huggingface.co/docs/trl/sft_trainer - **Why we use it:** Liquid AI's officially recommended fine-tuning method - **Key feature:** Automatically handles chat template application, tokenization, and prompt masking - **Version note:** TRL v0.29 renamed `max_seq_length` to `max_length` in SFTConfig ### PEFT — LoRA (Low-Rank Adaptation) ```python from peft import LoraConfig, PeftModel ``` - **What:** Parameter-Efficient Fine-Tuning library — adds small trainable adapters to frozen models - **Docs:** https://huggingface.co/docs/peft - **Source:** https://github.com/huggingface/peft - **LoRA conceptual guide:** https://huggingface.co/docs/peft/conceptual_guides/lora - **LoRA paper:** Hu, E., et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." arXiv:2106.09685 - **Why we use it:** Trains only ~1-5% of parameters — makes fine-tuning possible on a laptop ### LoRA Configuration (from Liquid AI Cookbook) ```python peft_config = LoraConfig( r=8, lora_alpha=16, lora_dropout=0.1, target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "w1", "w2", "w3", "in_proj"], ) ``` - **Source:** https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb - **Target modules explained:** - `q_proj, k_proj, v_proj, out_proj` — Multi-Head Attention layers - `w1, w2, w3` — GLU (Gated Linear Unit) feed-forward layers - `in_proj` — Conv block input projection (unique to Liquid AI architecture) - **Why these modules:** Liquid AI's architecture is not a standard transformer — it has additional conv and GLU layers. Adapting all layer types gives better results than attention-only LoRA. - **Note:** Standard transformer LoRA typically only targets `q_proj` and `v_proj`. The expanded target list is specific to LFM2 models. --- ## 3. PyTorch & Apple Silicon ### PyTorch MPS Backend ```python import torch torch.backends.mps.is_available() # True on Apple Silicon ``` - **What:** Metal Performance Shaders — PyTorch's backend for Apple Silicon GPU acceleration - **Docs:** https://pytorch.org/docs/stable/notes/mps.html - **Why we use it:** Enables GPU-accelerated training on Mac without NVIDIA hardware - **Key finding:** MPS saturates at batch size 4 for this model — batch size 8 showed no speed improvement (steps halved but each step took 2x longer) ### HuggingFace Accelerate ```python # device_map="auto" uses accelerate under the hood model = AutoModelForCausalLM.from_pretrained(..., device_map="auto") ``` - **What:** Automatic device placement library - **Docs:** https://huggingface.co/docs/accelerate - **Why we use it:** Automatically places model on MPS (Mac), CUDA (NVIDIA), or CPU --- ## 4. HuggingFace Transformers ### AutoModelForCausalLM / AutoTokenizer ```python from transformers import AutoModelForCausalLM, AutoTokenizer ``` - **What:** Auto-classes that load any causal language model from HuggingFace Hub - **Docs:** https://huggingface.co/docs/transformers - **Source:** https://github.com/huggingface/transformers - **Chat templates:** https://huggingface.co/docs/transformers/en/chat_templating - **Why we use it:** Standard interface for loading and running Liquid AI models ### HuggingFace Datasets ```python from datasets import Dataset dataset = Dataset.from_list(examples) ``` - **What:** Library for loading and processing datasets - **Docs:** https://huggingface.co/docs/datasets - **Why we use it:** SFTTrainer expects HuggingFace Dataset objects with a "messages" column --- ## 5. Training Data ### Dataset Source - **Origin:** Generated by the MLX sibling project using Qwen3-VL-32B - **HuggingFace dataset:** `FaroukMoc2/email_spam-qwen3-vl-32b` - Source: https://huggingface.co/datasets/FaroukMoc2/email_spam-qwen3-vl-32b - **Size:** 3,200 training + 800 test examples - **Format:** JSONL with chat-style messages (`system`, `user`, `assistant` roles) - **Why reused:** The JSONL chat format is model-agnostic — works with any model that supports chat templates ### Original Email Dataset - **Source:** Kaggle spam email dataset (193,852 emails) - **CSV path:** `data/spam_Emails_data.csv` (symlinked from spam-xai-project) --- ## 6. Gradio Web Interface ### Gradio ```python import gradio as gr with gr.Blocks() as demo: ... demo.launch() ``` - **What:** Python library for building ML web interfaces - **Docs:** https://www.gradio.app/docs - **Source:** https://github.com/gradio-app/gradio - **Why we use it:** Quick web UI for email classification — same as MLX version for consistency --- ## 7. Performance Findings (Empirical) These findings were discovered during development on a MacBook Pro M4 Pro with 24 GB unified memory: | Finding | Details | |---------|---------| | MPS batch size sweet spot | Batch size 4 is optimal. Batch size 8 halved steps but doubled time per step — GPU saturated. | | Memory usage | ~7-8 GB during training (1.2B model bf16 + LoRA + optimizer + activations) | | Training speed | ~0.34 it/s at batch size 4 on MPS | | Model load time | 30-60 seconds for initial model loading into memory | | MLX vs PyTorch MPS | MLX (used in sibling project) is significantly faster for Apple Silicon — purpose-built vs compatibility layer | | No orphaned ports | Unlike MLX version (which spawns llama-server), PyTorch loads in-process — clean shutdown | | TRL v0.29 breaking change | `max_seq_length` renamed to `max_length` in SFTConfig | | LFM2 layer names | Uses `out_proj` (not `o_proj` like standard transformers) | --- ## 8. Comparison with MLX Version | Aspect | MLX Version | Liquid AI Version | |--------|-------------|-------------------| | Model | Qwen3.5-0.8B (4-bit quantized) | LFM2.5-1.2B-Instruct (bf16) | | Architecture | Transformer | Liquid Neural Network (state-space + attention + conv) | | Framework | Apple MLX + mlx-lm | PyTorch + HuggingFace Transformers + TRL + PEFT | | Fine-tuning tool | mlx-lm LoRA CLI | TRL SFTTrainer + PEFT LoRA | | Training speed | ~10-20 min | ~37 min (1 epoch), ~2 hrs (3 epochs) | | Memory usage | ~3-4 GB | ~7-8 GB | | Platform | Apple Silicon only | Any platform (Mac MPS, NVIDIA CUDA, CPU) | | Model serving | Spawns llama-server (can leak ports) | In-process PyTorch (clean shutdown) | | LoRA targets | Attention layers only | Attention + GLU + Conv (8 module types) | | Training data | Same (model-agnostic JSONL format) | Same (copied from MLX project) | | Gradio UI | Identical | Identical | --- ## Academic Citations (for Paper) ``` Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. Liquid AI. (2025). LFM2: Liquid Foundation Models 2. arXiv:2511.23404. Liquid AI. (2026). Liquid AI Cookbook: Fine-tuning notebooks. https://github.com/Liquid4All/cookbook Liquid AI. (2026). LFM2.5-1.2B-Instruct model card. https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct von Werra, L., et al. (2020). TRL: Transformer Reinforcement Learning. https://github.com/huggingface/trl Mangrulkar, S., et al. (2022). PEFT: Parameter-Efficient Fine-Tuning. https://github.com/huggingface/peft Wolf, T., et al. (2020). Transformers: State-of-the-Art Natural Language Processing. Proceedings of EMNLP 2020 (Systems Demonstrations), pp. 38-45. https://github.com/huggingface/transformers Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32, pp. 8024-8035. ```