tzervas commited on
Commit
41d3e72
·
verified ·
1 Parent(s): 10d5855

Final 500M model (loss=11.2343)

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - tritter
5
+ - bitnet
6
+ - code
7
+ - 500m
8
+ ---
9
+
10
+ # Tritter 500M BitNet
11
+
12
+ A 500M parameter BitNet b1.58 ternary-quantized model trained for code generation.
13
+
14
+ ## Training Details
15
+
16
+ - **Parameters**: 524,372,480
17
+ - **Training tokens**: 118,111,072
18
+ - **Final loss**: 11.2343
19
+ - **Min loss**: 11.0722
20
+ - **Tokens/sec**: 23679.4
21
+ - **Training duration**: 1:23:07.915359
22
+ - **GPU**: NVIDIA GeForce RTX 5080
23
+
24
+ ## Checkpoints
25
+
26
+ Intermediate checkpoints available at 10%, 20%, ..., 90% progress.
27
+
28
+ Generated with [Tritter](https://github.com/tzervas/tritter)