Vini Pico - 25M BitNet b1.58 Language Model
A tiny but capable language model built with BitNet b1.58 architecture (ternary weights). Designed for mobile deployment with tool-calling capabilities.
Model Details
| Property | Value |
|---|---|
| Parameters | 24.9M (unique) |
| Architecture | BitNet b1.58 |
| Dimensions | 384 |
| Layers | 8 |
| Attention Heads | 6 (2 KV) |
| Hidden Dim | 1024 (SwiGLU) |
| Max Seq Length | 2048 |
| Vocab Size | 32,000 |
| Weight Precision | 1.58-bit (ternary) |
Training
- Pre-training: ~1B tokens from FineWeb-Edu, StarCoderData (TypeScript/Dart), ToolACE, Cosmopedia, OpenHermes
- SFT: ToolACE, Glaive Function Calling, OpenHermes 2.5
- Architecture: Custom BitLinear layers with STE quantization, RoPE, GQA, SwiGLU, RMSNorm
Capabilities
- English text generation
- TypeScript/Dart code completion
- Tool/function calling in XML format
- General instruction following
Usage
import torch
# Load checkpoint
ckpt = torch.load("model.pt", map_location="cpu")
# See training script for model architecture
Author
Jay Patel - GitHub