jayptl-rq commited on
Commit
066f37b
·
verified ·
1 Parent(s): 594e51e

Add model card

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - bitnet
7
+ - code
8
+ - tool-calling
9
+ - typescript
10
+ - dart
11
+ - efficient
12
+ - mobile
13
+ library_name: pytorch
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Vini Pico - 25M BitNet b1.58 Language Model
18
+
19
+ A tiny but capable language model built with BitNet b1.58 architecture (ternary weights).
20
+ Designed for mobile deployment with tool-calling capabilities.
21
+
22
+ ## Model Details
23
+
24
+ | Property | Value |
25
+ |----------|-------|
26
+ | Parameters | 24.9M (unique) |
27
+ | Architecture | BitNet b1.58 |
28
+ | Dimensions | 384 |
29
+ | Layers | 8 |
30
+ | Attention Heads | 6 (2 KV) |
31
+ | Hidden Dim | 1024 (SwiGLU) |
32
+ | Max Seq Length | 2048 |
33
+ | Vocab Size | 32,000 |
34
+ | Weight Precision | 1.58-bit (ternary) |
35
+
36
+ ## Training
37
+
38
+ - **Pre-training**: ~1B tokens from FineWeb-Edu, StarCoderData (TypeScript/Dart), ToolACE, Cosmopedia, OpenHermes
39
+ - **SFT**: ToolACE, Glaive Function Calling, OpenHermes 2.5
40
+ - **Architecture**: Custom BitLinear layers with STE quantization, RoPE, GQA, SwiGLU, RMSNorm
41
+
42
+ ## Capabilities
43
+
44
+ - English text generation
45
+ - TypeScript/Dart code completion
46
+ - Tool/function calling in XML format
47
+ - General instruction following
48
+
49
+ ## Usage
50
+
51
+ ```python
52
+ import torch
53
+ # Load checkpoint
54
+ ckpt = torch.load("model.pt", map_location="cpu")
55
+ # See training script for model architecture
56
+ ```
57
+
58
+ ## Author
59
+
60
+ Jay Patel - [GitHub](https://github.com/jayptl-me)