purple-squirrel-r1 / BLOG_POST.md
purplesquirrelnetworks's picture
Add blog post: Building Purple Squirrel R1 on Apple Silicon
00073e3 verified

Building Purple Squirrel R1: Fine-Tuning DeepSeek-R1 for Distributed Systems on Apple Silicon

Author: Matthew Karsten / Purple Squirrel Media Published: February 2026


Introduction

Purple Squirrel R1 is a domain-specialized language model fine-tuned from DeepSeek-R1-Distill-Llama-8B, targeting distributed systems, blockchain infrastructure, decentralized GPU compute, and cross-chain DeFi protocol design. Where the base model offers broad general reasoning, Purple Squirrel R1 has been shaped by real-world conference data, on-chain research, and applied cryptographic hardware work.

The motivation was straightforward: existing general-purpose LLMs perform poorly on niche infrastructure questions — particularly those involving decentralized compute economics, air-gapped hardware wallets, and blockchain event data. By fine-tuning on a curated, domain-specific dataset, we achieved measurable improvements in validation loss and qualitatively sharper responses across our target domains — all trained locally on Apple Silicon without a single GPU cloud dollar spent.


Training Setup

Fine-tuning was performed using MLX-LM with LoRA (Low-Rank Adaptation) on Apple Silicon (M-series), making this an entirely on-device training run.

LoRA Configuration:

  • Rank: 8
  • Scale (alpha/rank): 20
  • Layers adapted: 4
  • Base model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  • Framework: MLX-LM (Apple MLX)
  • Precision: bfloat16

Apple's MLX framework provided efficient unified memory utilization, enabling full fine-tuning runs without offloading. The LoRA adapter targets the attention projection layers, keeping the parameter footprint small while achieving meaningful domain adaptation. Training on Apple Silicon is not just viable — for small-to-medium LoRA runs, it is genuinely competitive with cloud GPU setups when accounting for iteration speed, cost, and privacy.


Training Data

The dataset was assembled from two primary sources:

Wrapped Events Conference Dataset

  • 1,133 Q&A pairs derived from transcripts and session data across 58 Wrapped Events conferences
  • Covered topics: cross-chain protocol design, DeFi architecture, tokenomics, Layer 2 scaling, institutional crypto adoption
  • 7 distinct prompt/response formats: technical explanation, comparison, step-by-step, pros/cons, definition, use-case, and opinion-style reasoning

Blockchain Research Dataset

  • 135 on-chain records covering decentralized infrastructure research
  • Sources include internal AIDP (AI Distributed Processing) system benchmarks and Coldstar cryptographic hardware research
  • Structured as instruction-following pairs aligned with the DeepSeek-R1 chat template

The combination of real-world conference knowledge and applied research data gives the model grounded, specific answers rather than surface-level summaries drawn from generic web crawls.


Results

Fine-tuning produced a consistent and meaningful improvement in validation loss:

Metric Value
Initial Validation Loss 3.799
Final Validation Loss 3.091
Improvement -18.6%

An 18.6% reduction in validation loss on a domain-specific held-out set indicates the model has successfully internalized the target knowledge without catastrophic forgetting of its base reasoning capabilities. Qualitative evaluation confirms sharper, more specific responses on distributed systems and blockchain topics.


GGUF Quantizations

To maximize accessibility across hardware configurations, the model is published in four GGUF quantization levels:

Format Size Best For
F16 16.1 GB Maximum fidelity, high-RAM systems
Q8_0 8.5 GB Near-lossless, recommended for 16GB RAM
Q5_K_M 5.7 GB Balanced quality/size
Q4_K_M 4.9 GB Minimum viable, low-RAM deployment

Q8_0 is the recommended default for Apple Silicon machines with 16GB unified memory. Q4_K_M enables deployment on constrained edge hardware.


Research Papers

Purple Squirrel R1's training data draws from three research workstreams produced by Purple Squirrel Media:

AIDP Neural Cloud — Decentralized LLM Inference Benchmarked decentralized GPU network inference against centralized cloud providers. Key findings: 47% cost reduction at scale, 180ms p50 latency on standard LLM workloads. The dataset captures the architectural tradeoffs, pricing models, and operational considerations from this research.

AIDP Video Forge — Distributed Video Processing Evaluated distributed GPU pipelines for AI video generation and processing. Results showed a 16x GPU throughput improvement versus single-node processing, with VMAF quality scores averaging 95.8 — broadcast-grade output from a distributed architecture.

Coldstar — Air-Gapped Solana Cold Wallet A hardware and software framework for air-gapped Solana transaction signing, combining Rust secure memory handling, macOS USB device enumeration, and offline key management. Submitted to the Colosseum Agent Hackathon as Project #62. The dataset includes detailed Q&A on cold wallet architecture, key derivation, and transaction serialization for offline signing.


How to Use

Ollama Quick Start

Create a Modelfile:

FROM purple-squirrel-r1:q8_0

SYSTEM """
You are an expert in distributed systems, blockchain infrastructure, decentralized GPU compute, and cross-chain DeFi protocol design. Provide specific, technically accurate answers drawing on real-world deployment experience.
"""

PARAMETER temperature 0.6
PARAMETER top_p 0.95

Then run:

ollama create purple-squirrel-r1 -f Modelfile
ollama run purple-squirrel-r1

llama.cpp

./llama-cli \
  -m purple-squirrel-r1-q8_0.gguf \
  -p "Compare the cost structure of centralized vs decentralized LLM inference at scale." \
  -n 512 \
  --temp 0.6

LM Studio

Download the Q5_K_M or Q8_0 GGUF from the collection, load it in LM Studio, and set the system prompt to focus on distributed systems and blockchain topics for best results.


Links


About

Built by Purple Squirrel Media — a distributed systems and Web3 infrastructure studio.

Matthew Karsten | @expertvagabond | MatthewKarstenConnects@gmail.com

All training was performed locally on Apple Silicon. No cloud GPUs were used. The full training dataset will be published alongside this release.