Upload 12 files
Browse filesModel:
Tiny_MoE_Config(
name='Tiny_MoE',
dtype=jax.numpy.bfloat16,
param_dtype=jax.numpy.float32,
block_size=2048,
vocab_size=49152,
n_layer=30,
n_head=9,
n_kv_head=3,
n_embed=576,
n_experts=8,
mesh=None,
top_k=2,
load_factor=10.0,
expert_weight_priority=False,
load_balance_loss_coeff=0.01,
z_loss_coeff=0.0001,
n_mlp_hidden=1536,
mlp_bias=False,
attention_bias=False,
moe_bias=False,
ln_epsilon=1e-05,
glu_activation='silu',
sdpa_implementation='slow',
rope_theta=0.0001,
init_stddev=0.02,
use_cache=False,
glu_fc_kernel_sharding=(None,),
glu_fc_bias_sharding=(None,),
glu_gate_kernel_sharding=(None,),
glu_gate_bias_sharding=(None,),
glu_proj_kernel_sharding=(None,),
glu_proj_bias_sharding=(None,),
attn_wq_kernel_sharding=(None,),
attn_wq_bias_sharding=(None,),
attn_wkv_kernel_sharding=(None,),
attn_wkv_bias_sharding=(None,),
attn_wproj_kernel_sharding=(None,),
attn_wproj_bias_sharding=(None,),
embed_partition_spec=(None,),
rmsnorm_partition_spec=(None,),
)
Parameter Count: 413,275,968
MOE (Sharded) Parameter Count: 318,504,960
Replicated Parameter Count: 94,771,008
Active Parameter Count: 174,397,248.0
% Active Parameters: 42.20
Dataset:
https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus
- run_20250826_isander_dingo/checkpoint-61796.pt/_CHECKPOINT_METADATA +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/_METADATA +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/_sharding +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/array_metadatas/process_0 +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/d/c56ed386b7a53157c0dc7189c0af1e35 +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/manifest.ocdbt +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/ocdbt.process_0/d/8498c3a155842619dd4839be55ccbec0 +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/ocdbt.process_0/d/95cf84baf91161a195594ef8368ff61b +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/ocdbt.process_0/d/c58e0b3b76eb0acd14c081b50d7cda41 +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/ocdbt.process_0/d/c5aa49e6919e24cd3565479bb5c7dfa4 +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/ocdbt.process_0/d/f331034c9eefdb605002a7430f33878a +3 -0
- run_20250826_isander_dingo/checkpoint-61796.pt/ocdbt.process_0/manifest.ocdbt +3 -0
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:faa231a4b7f2bfde63b1bdf6b6407a97eaf178ada149cc5913f276c65058d41d
|
| 3 |
+
size 262
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:98aebf8504f85b582606151da8cd48ad62a66f3297649fb0ed755b85d9941005
|
| 3 |
+
size 90611
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0791ae7bc69eb4a767aeebb1afa4b5c9a444d972a39459de906df07c19848b60
|
| 3 |
+
size 74440
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d5edad4ecd6176cb98a973dabf0aa6ea404656cd5ddadeaf56d46b2e3a45dc46
|
| 3 |
+
size 36120
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5353971c881a5fc7b3c3a219bbe78880c55750bbc34670f19b5b8bc33ac49178
|
| 3 |
+
size 139227
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0f5e131205c37e0fd4ee235ce1ec4b118397153599f49e66f5cfcaa6b05b5600
|
| 3 |
+
size 120
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:98167ddeb42c81a1f5ca27e24a28b775b1cbf9d360bb7240fbca4b02eabd56ec
|
| 3 |
+
size 774043074
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3cbe80ea3e43f138222cf8f78630f3304e28b4aaf9fa98f3f51cd3df1c9881c1
|
| 3 |
+
size 791
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5fcf0be9a8a51755d1975c2ef265dc447a879ed7b82ba54427ce86b519226f11
|
| 3 |
+
size 754954129
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:88097147aee865f0b17530f3203e1db7f8c1577494ea7943b9bb9c6413f5fab7
|
| 3 |
+
size 29155
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:90f96f9b66dc77141bebeb6b277d8e8bd5ddc16bde0400dc28251d8dd8e86e8b
|
| 3 |
+
size 200
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bb538ed60d39f06271ca07625fceae9544c8549a0596448e6a8c67f7e976ce50
|
| 3 |
+
size 323
|