Instructions to use Abhinav-Tyagi/synapse_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Abhinav-Tyagi/synapse_v2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Abhinav-Tyagi/synapse_v2", filename="synapse-full-f16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Abhinav-Tyagi/synapse_v2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Abhinav-Tyagi/synapse_v2:F16 # Run inference directly in the terminal: llama-cli -hf Abhinav-Tyagi/synapse_v2:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Abhinav-Tyagi/synapse_v2:F16 # Run inference directly in the terminal: llama-cli -hf Abhinav-Tyagi/synapse_v2:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Abhinav-Tyagi/synapse_v2:F16 # Run inference directly in the terminal: ./llama-cli -hf Abhinav-Tyagi/synapse_v2:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Abhinav-Tyagi/synapse_v2:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Abhinav-Tyagi/synapse_v2:F16
Use Docker
docker model run hf.co/Abhinav-Tyagi/synapse_v2:F16
- LM Studio
- Jan
- Ollama
How to use Abhinav-Tyagi/synapse_v2 with Ollama:
ollama run hf.co/Abhinav-Tyagi/synapse_v2:F16
- Unsloth Studio new
How to use Abhinav-Tyagi/synapse_v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Abhinav-Tyagi/synapse_v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Abhinav-Tyagi/synapse_v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Abhinav-Tyagi/synapse_v2 to start chatting
- Pi new
How to use Abhinav-Tyagi/synapse_v2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Abhinav-Tyagi/synapse_v2:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Abhinav-Tyagi/synapse_v2:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Abhinav-Tyagi/synapse_v2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Abhinav-Tyagi/synapse_v2:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Abhinav-Tyagi/synapse_v2:F16
Run Hermes
hermes
- Docker Model Runner
How to use Abhinav-Tyagi/synapse_v2 with Docker Model Runner:
docker model run hf.co/Abhinav-Tyagi/synapse_v2:F16
- Lemonade
How to use Abhinav-Tyagi/synapse_v2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Abhinav-Tyagi/synapse_v2:F16
Run and chat with the model
lemonade run user.synapse_v2-F16
List all available models
lemonade list
language: - en license: mit tags: - transformer - decoder-only - from-scratch - bpe-tokenizer - instruction-tuning - educational - nlp - pytorch - small-language-model - custom-architecture pipeline_tag: text-generation
Synapse v2 β Decoder-Only Transformer Built from First Principles
Built by Abhinav Tyagi
π GitHub β’ πΌ LinkedIn β’ π Research Paper (PDF)
Overview
Synapse v2 is a 3.67M parameter decoder-only Transformer built entirely from scratch β no HuggingFace Trainer, no pre-built architecture. Every component is implemented manually: the attention mechanism, BPE tokenizer, positional embeddings, training loop, and inference pipeline.
This is the second generation of the Synapse series, representing a 4.6Γ parameter scale-up and 3Γ depth increase from Synapse v1 β with a core focus on transitioning from memorization to generalization.
"This work prioritizes understanding over performance. Building from first principles reveals what production models abstract away." β Abhinav Tyagi
Evolution: v1 β v2
| Metric | Synapse v1 | Synapse v2 | Factor |
|---|---|---|---|
| Parameters | 800K | 3.67M | 4.6Γ |
| Layers | 4 | 12 | 3Γ |
| Vocabulary | 1,500 (basic BPE) | 1,037 (Turbo BPE) | Professional |
| Context Window | 128 tokens | 64 tokens | Optimized |
| Regularization | None | Dropout 0.1 | Added |
| Training Loss | 0.05 (memorization) | 2.04 | Better generalization |
| Validation Loss | Not measured | 3.26 | Tracked |
| Perplexity | 1.05 (overfit) | 7.7 train / 26.1 val | Learned patterns |
| Capability | Text continuation | Instruction following | Functional |
The core lesson: systematic scaling + regularization + quality data = generalization.
Architecture
Raw Text
β
Turbo BPE Tokenization (GPT-2 Regex Pattern)
β
Token Embeddings (1037 β 128)
+
Positional Embeddings (64 β 128)
β
12Γ Transformer Blocks
[LayerNorm β Multi-Head Attention (4 heads) β Residual]
[LayerNorm β Feed-Forward (128β512β128) β Residual]
β
Final LayerNorm
β
LM Head (128 β 1037)
β
Next Token
Model Specs
| Component | Value | Rationale |
|---|---|---|
| Model Dimension | 128 | Sweet spot for 3β5M parameter range |
| Attention Heads | 4 (32-dim each) | Optimal for 128-dim model |
| Transformer Layers | 12 | Enables hierarchical feature learning |
| Context Window | 64 tokens | Optimized for dialogue efficiency |
| Vocabulary Size | 1,037 (Turbo BPE) | Professional compression |
| FFN Hidden Size | 512 (4Γ expansion) | Standard Transformer ratio |
| Dropout | 0.1 | Regularization without over-damping |
| Total Parameters | 3,672,832 | CPU-trainable, interpretable |
Parameter Breakdown
| Component | Parameters | % |
|---|---|---|
| Token Embedding [1037Γ128] | 132,736 | 3.6% |
| Position Embedding [64Γ128] | 8,192 | 0.2% |
| 12Γ Transformer Blocks | ~3,480,000 | 94.7% |
| β Self-Attention per block | ~49,000 | β |
| β Feed-Forward per block | ~131,000 | β |
| β LayerNorms per block | 512 | β |
| Final LayerNorm | 256 | 0.0% |
| LM Head [128Γ1037] | 132,736 | 3.6% |
| Total | 3,672,832 | 100% |
Tokenizer: Turbo BPE
Built from scratch β no dependency on HuggingFace or tiktoken.
Key innovation: GPT-2 regex pre-tokenization pattern (same pattern used by GPT-2, GPT-3, GPT-4, Llama):
's|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+
This ensures correct handling of contractions, numbers, punctuation, and whitespace β preventing cross-boundary merges that degrade tokenization quality.
Optimizations implemented:
- Doubly-linked list for O(1) merge updates (vs O(n) list rebuilding)
- Hash map for O(1) pair lookup
- Position sets to track all occurrences efficiently
- Rank-based encoding (deterministic, matches GPT-2/GPT-4 behavior)
Compression achieved: 3β4Γ sequence length reduction.
Example:
"don't" β Basic BPE: ['d','o','n',"'",'t'] # 5 tokens
"don't" β Turbo BPE: ['don', "'t"] # 2 tokens β
Training
Dataset
Hybrid instruction-tuned corpus (~18β20K pairs, ~200β500K tokens):
| Component | Size | Purpose |
|---|---|---|
| Dolly-15k | ~15K instructions | Reasoning, QA, summarization |
| Creator Profile | ~500 examples | Identity grounding |
| GenAI Knowledge | ~2K examples | Technical expertise |
| Domain Facts | ~1K examples | Grounded world knowledge |
Training Configuration
| Hyperparameter | Value |
|---|---|
| Optimizer | Adam |
| Learning Rate | 1e-3 |
| Batch Size | 4 |
| Training Steps | 5,000 |
| Dropout | 0.1 |
| Context Length | 64 tokens |
Training Results
| Metric | Value |
|---|---|
| Initial Loss | 8.54 |
| Final Training Loss | 2.04 |
| Final Validation Loss | 3.26 |
| Train Perplexity | 7.7 |
| Validation Perplexity | 26.1 |
Starting from random initialization (loss 8.54 β log(1037)), the model converged to 2.04 β demonstrating genuine learning, not memorization. The train/val gap (2.04 vs 3.26) confirms effective regularization via dropout.
Key Design Decisions Explained
Why 12 layers? Enables hierarchical learning: syntax (lower layers) β semantics (middle) β reasoning (upper). Empirically validated as the sweet spot for ~4M parameter models.
Why pre-norm (LayerNorm before attention)? More stable training β gradients don't explode or vanish as easily. Standard in modern architectures (GPT-3, Claude). Allows higher learning rates.
Why dropout 0.1? The v1 model had a perplexity of 1.05 β pure memorization, useless for generalization. Dropout forced the model to learn robust patterns. The validation gap (3.26 vs 2.04) proves it worked.
Why 64-token context (smaller than v1's 128)? Most conversational turns fit in 50β80 tokens. Reduces O(TΒ²) attention cost by 4Γ β faster training, same capability for dialogue tasks.
Usage
import torch
from model import TinyGPT
from tokenizer import TurboBPE
# Load model
model = TinyGPT(
vocab_size=1037,
n_embd=128,
n_head=4,
n_layer=12,
block_size=64,
dropout=0.0 # Disable dropout at inference
)
model.load_state_dict(torch.load("synapse_v2.pt", map_location="cpu"))
model.eval()
# Load tokenizer
tokenizer = TurboBPE.load("tokenizer_state.json")
# Generate
prompt = "Instruction: What is a Transformer model?\nResponse:"
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens])
with torch.no_grad():
for _ in range(100):
logits, _ = model(input_ids)
next_token = torch.argmax(logits[:, -1, :], dim=-1)
input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=1)
print(tokenizer.decode(input_ids[0].tolist()))
Lessons Learned
Building Synapse v2 from scratch produced insights that using pre-built frameworks hides:
- Tokenization quality is foundational β Bad tokenization forces the model to waste capacity learning token boundaries instead of meaning
- Dropout is not optional at small scale β Without it, small models memorize training data completely (perplexity 1.05 in v1)
- Pre-norm vs post-norm matters β Pre-norm enabled stable training at 12 layers; post-norm would have required careful LR tuning
- Validation loss is the only truth β Training loss is meaningless without a validation signal
- Instruction formatting teaches task structure β The model learns what a "question" and "answer" look like before it learns the content
Synapse Series
| Model | Description | Params |
|---|---|---|
| Synapse v1 | First principles transformer | 800K |
| Synapse v2 (this) | Scaled + instruction-tuned | 3.67M |
| Synapse SLM | QLoRA fine-tuned Llama-3.2-3B | 3B |
| Synapse-124M | Custom LLM with GQA, MoE, NTK-RoPE | 124M |
About the Author
Abhinav Tyagi is an LLM Engineer who builds AI systems from the ground up β from custom tokenizers and transformer architectures to production RAG pipelines and agentic systems.
Other work:
- Synapse-124M β 124M parameter transformer from scratch: GQA, MoE, Sliding Window, NTK-RoPE, SwiGLU, custom BPE
- Synapse Wingman β Agentic AI desktop assistant (Telegram β PC control, vision, WhatsApp automation)
- Smart RAG Chatbot β Hybrid RAG with Chain of Verification (CoVe), multi-query generation, FAISS
- Psywarp β Published research on multimodal cognitive AI framework (DOI: 10.5281/zenodo.18182199)
π§ abhinavtyagi5418@gmail.com
π GitHub
πΌ LinkedIn
Citation
@misc{tyagi2026synapsev2,
author = {Tyagi, Abhinav},
title = {Synapse v2: A Decoder-Only Transformer Language Model Built from First Principles},
year = {2026},
url = {https://huggingface.co/Abhinav-Tyagi/synapse_v2}
}
License
MIT β free to use, modify, and distribute with attribution.
"Understanding requires building. Building requires breaking things. Breaking things requires documentation."
β Abhinav Tyagi
- Downloads last month
- 1
16-bit