Upload folder using huggingface_hub

Files changed (4) hide show

README.md ADDED Viewed

+---
+license: mit
+library_name: oxidizr
+tags:
+- oxidizr
+- llm
+- mamba
+pipeline_tag: text-generation
+---
+# nano-start_64_26m_f32
+Trained with [oxidizr](https://github.com/farhan-syah/oxidizr).
+## Model Details
+| Property | Value |
+|----------|-------|
+| Parameters | 26.73M |
+| Architecture | 3 Mamba2 + 1 MLA + MoE (2 experts, top-1) |
+| Vocab Size | 100315 |
+| Max Seq Length | 64 |
+| Hidden Size | 128 |
+| Layers | 4 |
+## Training Details
+| Property | Value |
+|----------|-------|
+| Checkpoint | final |
+| Final Loss | 0.0738 |
+| Total Steps | 241 |
+| Learning Rate | 2.00e-3 |
+## Usage
+### With blazr (recommended)
+```bash
+# Generate text
+blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Hello, world!"
+# Start inference server
+blazr serve --model fs90/nano-start_64_26m_f32 --port 8080
+```
+### Download
+```bash
+# Clone the model
+git clone https://huggingface.co/fs90/nano-start_64_26m_f32
+# Or use huggingface-cli
+huggingface-cli download fs90/nano-start_64_26m_f32 --local-dir ./model
+```

config.json ADDED Viewed

+{
+  "hidden_size": 128,
+  "num_layers": 4,
+  "vocab_size": 100315,
+  "mamba2_num_heads": 16,
+  "mamba2_head_dim": 16,
+  "mamba2_state_size": 32,
+  "mamba2_chunk_size": 32,
+  "mamba2_expand": 2,
+  "mamba2_conv_kernel": 4,
+  "num_attention_heads": 4,
+  "kv_latent_dim": 64,
+  "q_latent_dim": 64,
+  "d_rope": 8,
+  "num_experts": 2,
+  "experts_per_tok": 1,
+  "shared_expert_enabled": true,
+  "intermediate_size": 512,
+  "mamba_layers": [
+    0,
+    1,
+    2
+  ],
+  "rms_norm_eps": 0.00001,
+  "max_seq_len": 64
+}

model.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:f21d406137fc3d107bb53d6167d7b84a1414f97607ddc5f741d75c5cccd8da4f
+size 106677352

training_config.json ADDED Viewed

+{
+  "config": {
+    "model": {
+      "dtype": "f32",
+      "experts_per_tok": 1,
+      "hidden_size": 128,
+      "max_seq_len": 64,
+      "name": "nano-start",
+      "num_experts": 2,
+      "num_heads": 4,
+      "num_layers": 4,
+      "vocab_size": 100315
+    },
+    "trainer": {
+      "batch_size": 4,
+      "effective_batch_size": 8,
+      "gradient_accumulation": 2,
+      "learning_rate": 0.002,
+      "max_steps": 0,
+      "num_epochs": 20,
+      "seq_len": 64,
+      "total_steps": 260
+    }
+  },
+  "dataset_size": 6379,
+  "device": "Cuda(CudaDevice(DeviceId(1)))",
+  "error": null,
+  "run_dir": "./runs/20251205_144741",
+  "status": "completed"
+}