Vini Pico - 25M BitNet b1.58 Language Model

A tiny but capable language model built with BitNet b1.58 architecture (ternary weights). Designed for mobile deployment with tool-calling capabilities.

Model Details

Property Value
Parameters 24.9M (unique)
Architecture BitNet b1.58
Dimensions 384
Layers 8
Attention Heads 6 (2 KV)
Hidden Dim 1024 (SwiGLU)
Max Seq Length 2048
Vocab Size 32,000
Weight Precision 1.58-bit (ternary)

Training

  • Pre-training: ~1B tokens from FineWeb-Edu, StarCoderData (TypeScript/Dart), ToolACE, Cosmopedia, OpenHermes
  • SFT: ToolACE, Glaive Function Calling, OpenHermes 2.5
  • Architecture: Custom BitLinear layers with STE quantization, RoPE, GQA, SwiGLU, RMSNorm

Capabilities

  • English text generation
  • TypeScript/Dart code completion
  • Tool/function calling in XML format
  • General instruction following

Usage

import torch
# Load checkpoint
ckpt = torch.load("model.pt", map_location="cpu")
# See training script for model architecture

Author

Jay Patel - GitHub

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support