Instructions to use Abhinav-Tyagi/synapse_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Abhinav-Tyagi/synapse_v2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Abhinav-Tyagi/synapse_v2",
	filename="synapse-full-f16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Abhinav-Tyagi/synapse_v2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abhinav-Tyagi/synapse_v2:F16
# Run inference directly in the terminal:
llama-cli -hf Abhinav-Tyagi/synapse_v2:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abhinav-Tyagi/synapse_v2:F16
# Run inference directly in the terminal:
llama-cli -hf Abhinav-Tyagi/synapse_v2:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Abhinav-Tyagi/synapse_v2:F16
# Run inference directly in the terminal:
./llama-cli -hf Abhinav-Tyagi/synapse_v2:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Abhinav-Tyagi/synapse_v2:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Abhinav-Tyagi/synapse_v2:F16

Use Docker

docker model run hf.co/Abhinav-Tyagi/synapse_v2:F16

LM Studio
Jan
Ollama
How to use Abhinav-Tyagi/synapse_v2 with Ollama:
```
ollama run hf.co/Abhinav-Tyagi/synapse_v2:F16
```

Unsloth Studio new

How to use Abhinav-Tyagi/synapse_v2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Abhinav-Tyagi/synapse_v2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Abhinav-Tyagi/synapse_v2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Abhinav-Tyagi/synapse_v2 to start chatting

Pi new

How to use Abhinav-Tyagi/synapse_v2 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Abhinav-Tyagi/synapse_v2:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Abhinav-Tyagi/synapse_v2:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Abhinav-Tyagi/synapse_v2 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Abhinav-Tyagi/synapse_v2:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Abhinav-Tyagi/synapse_v2:F16

Run Hermes

hermes

Docker Model Runner
How to use Abhinav-Tyagi/synapse_v2 with Docker Model Runner:
```
docker model run hf.co/Abhinav-Tyagi/synapse_v2:F16
```

Lemonade

How to use Abhinav-Tyagi/synapse_v2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Abhinav-Tyagi/synapse_v2:F16

Run and chat with the model

lemonade run user.synapse_v2-F16

List all available models

lemonade list

language: - en license: mit tags: - transformer - decoder-only - from-scratch - bpe-tokenizer - instruction-tuning - educational - nlp - pytorch - small-language-model - custom-architecture pipeline_tag: text-generation

Synapse v2 — Decoder-Only Transformer Built from First Principles

Built by Abhinav Tyagi
📄 GitHub • 💼 LinkedIn • 📜 Research Paper (PDF)

Overview

Synapse v2 is a 3.67M parameter decoder-only Transformer built entirely from scratch — no HuggingFace Trainer, no pre-built architecture. Every component is implemented manually: the attention mechanism, BPE tokenizer, positional embeddings, training loop, and inference pipeline.

This is the second generation of the Synapse series, representing a 4.6× parameter scale-up and 3× depth increase from Synapse v1 — with a core focus on transitioning from memorization to generalization.

"This work prioritizes understanding over performance. Building from first principles reveals what production models abstract away." — Abhinav Tyagi

Evolution: v1 → v2

Metric	Synapse v1	Synapse v2	Factor
Parameters	800K	3.67M	4.6×
Layers	4	12	3×
Vocabulary	1,500 (basic BPE)	1,037 (Turbo BPE)	Professional
Context Window	128 tokens	64 tokens	Optimized
Regularization	None	Dropout 0.1	Added
Training Loss	0.05 (memorization)	2.04	Better generalization
Validation Loss	Not measured	3.26	Tracked
Perplexity	1.05 (overfit)	7.7 train / 26.1 val	Learned patterns
Capability	Text continuation	Instruction following	Functional

The core lesson: systematic scaling + regularization + quality data = generalization.

Architecture

Raw Text
  ↓
Turbo BPE Tokenization (GPT-2 Regex Pattern)
  ↓
Token Embeddings (1037 → 128)
  +
Positional Embeddings (64 → 128)
  ↓
12× Transformer Blocks
  [LayerNorm → Multi-Head Attention (4 heads) → Residual]
  [LayerNorm → Feed-Forward (128→512→128) → Residual]
  ↓
Final LayerNorm
  ↓
LM Head (128 → 1037)
  ↓
Next Token

Model Specs

Component	Value	Rationale
Model Dimension	128	Sweet spot for 3–5M parameter range
Attention Heads	4 (32-dim each)	Optimal for 128-dim model
Transformer Layers	12	Enables hierarchical feature learning
Context Window	64 tokens	Optimized for dialogue efficiency
Vocabulary Size	1,037 (Turbo BPE)	Professional compression
FFN Hidden Size	512 (4× expansion)	Standard Transformer ratio
Dropout	0.1	Regularization without over-damping
Total Parameters	3,672,832	CPU-trainable, interpretable

Parameter Breakdown

Component	Parameters	%
Token Embedding [1037×128]	132,736	3.6%
Position Embedding [64×128]	8,192	0.2%
12× Transformer Blocks	~3,480,000	94.7%
— Self-Attention per block	~49,000	—
— Feed-Forward per block	~131,000	—
— LayerNorms per block	512	—
Final LayerNorm	256	0.0%
LM Head [128×1037]	132,736	3.6%
Total	3,672,832	100%

Tokenizer: Turbo BPE

Built from scratch — no dependency on HuggingFace or tiktoken.

Key innovation: GPT-2 regex pre-tokenization pattern (same pattern used by GPT-2, GPT-3, GPT-4, Llama):

's|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+

This ensures correct handling of contractions, numbers, punctuation, and whitespace — preventing cross-boundary merges that degrade tokenization quality.

Optimizations implemented:

Doubly-linked list for O(1) merge updates (vs O(n) list rebuilding)
Hash map for O(1) pair lookup
Position sets to track all occurrences efficiently
Rank-based encoding (deterministic, matches GPT-2/GPT-4 behavior)

Compression achieved: 3–4× sequence length reduction.

Example:

"don't" → Basic BPE: ['d','o','n',"'",'t']  # 5 tokens
"don't" → Turbo BPE: ['don', "'t"]           # 2 tokens ✅

Training

Dataset

Hybrid instruction-tuned corpus (~18–20K pairs, ~200–500K tokens):

Component	Size	Purpose
Dolly-15k	~15K instructions	Reasoning, QA, summarization
Creator Profile	~500 examples	Identity grounding
GenAI Knowledge	~2K examples	Technical expertise
Domain Facts	~1K examples	Grounded world knowledge

Training Configuration

Hyperparameter	Value
Optimizer	Adam
Learning Rate	1e-3
Batch Size	4
Training Steps	5,000
Dropout	0.1
Context Length	64 tokens

Training Results

Metric	Value
Initial Loss	8.54
Final Training Loss	2.04
Final Validation Loss	3.26
Train Perplexity	7.7
Validation Perplexity	26.1

Starting from random initialization (loss 8.54 ≈ log(1037)), the model converged to 2.04 — demonstrating genuine learning, not memorization. The train/val gap (2.04 vs 3.26) confirms effective regularization via dropout.

Key Design Decisions Explained

Why 12 layers? Enables hierarchical learning: syntax (lower layers) → semantics (middle) → reasoning (upper). Empirically validated as the sweet spot for ~4M parameter models.

Why pre-norm (LayerNorm before attention)? More stable training — gradients don't explode or vanish as easily. Standard in modern architectures (GPT-3, Claude). Allows higher learning rates.

Why dropout 0.1? The v1 model had a perplexity of 1.05 — pure memorization, useless for generalization. Dropout forced the model to learn robust patterns. The validation gap (3.26 vs 2.04) proves it worked.

Why 64-token context (smaller than v1's 128)? Most conversational turns fit in 50–80 tokens. Reduces O(T²) attention cost by 4× — faster training, same capability for dialogue tasks.

Usage

import torch
from model import TinyGPT
from tokenizer import TurboBPE

# Load model
model = TinyGPT(
    vocab_size=1037,
    n_embd=128,
    n_head=4,
    n_layer=12,
    block_size=64,
    dropout=0.0  # Disable dropout at inference
)
model.load_state_dict(torch.load("synapse_v2.pt", map_location="cpu"))
model.eval()

# Load tokenizer
tokenizer = TurboBPE.load("tokenizer_state.json")

# Generate
prompt = "Instruction: What is a Transformer model?\nResponse:"
tokens = tokenizer.encode(prompt)
input_ids = torch.tensor([tokens])

with torch.no_grad():
    for _ in range(100):
        logits, _ = model(input_ids)
        next_token = torch.argmax(logits[:, -1, :], dim=-1)
        input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=1)

print(tokenizer.decode(input_ids[0].tolist()))

Lessons Learned

Building Synapse v2 from scratch produced insights that using pre-built frameworks hides:

Tokenization quality is foundational — Bad tokenization forces the model to waste capacity learning token boundaries instead of meaning
Dropout is not optional at small scale — Without it, small models memorize training data completely (perplexity 1.05 in v1)
Pre-norm vs post-norm matters — Pre-norm enabled stable training at 12 layers; post-norm would have required careful LR tuning
Validation loss is the only truth — Training loss is meaningless without a validation signal
Instruction formatting teaches task structure — The model learns what a "question" and "answer" look like before it learns the content

Synapse Series

Model	Description	Params
Synapse v1	First principles transformer	800K
Synapse v2 (this)	Scaled + instruction-tuned	3.67M
Synapse SLM	QLoRA fine-tuned Llama-3.2-3B	3B
Synapse-124M	Custom LLM with GQA, MoE, NTK-RoPE	124M

About the Author

Abhinav Tyagi is an LLM Engineer who builds AI systems from the ground up — from custom tokenizers and transformer architectures to production RAG pipelines and agentic systems.

Other work:

Synapse-124M — 124M parameter transformer from scratch: GQA, MoE, Sliding Window, NTK-RoPE, SwiGLU, custom BPE
Synapse Wingman — Agentic AI desktop assistant (Telegram → PC control, vision, WhatsApp automation)
Smart RAG Chatbot — Hybrid RAG with Chain of Verification (CoVe), multi-query generation, FAISS
Psywarp — Published research on multimodal cognitive AI framework (DOI: 10.5281/zenodo.18182199)

📧 abhinavtyagi5418@gmail.com
🐙 GitHub
💼 LinkedIn

Citation

@misc{tyagi2026synapsev2,
  author = {Tyagi, Abhinav},
  title  = {Synapse v2: A Decoder-Only Transformer Language Model Built from First Principles},
  year   = {2026},
  url    = {https://huggingface.co/Abhinav-Tyagi/synapse_v2}
}

License

MIT — free to use, modify, and distribute with attribution.

"Understanding requires building. Building requires breaking things. Breaking things requires documentation."
— Abhinav Tyagi

Downloads last month: 1

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support