rootxhacker commited on
Commit
7555cfa
·
verified ·
1 Parent(s): 1a13f4f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: [en]
4
+ library_name: safetensors
5
+ pipeline_tag: text-generation
6
+ tags: [hobbylm, mixture-of-experts, moe, sparse-moe]
7
+ ---
8
+
9
+ # HobbyLM-Diffusion (500M MoE, text diffusion / LLaDA-style)
10
+
11
+ Masked-diffusion (LLaDA-style) variant of HobbyLM for bidirectional / parallel decoding.
12
+
13
+ Part of the **HobbyLM** family — a from-scratch 500M sparse-MoE model trained on consumer-scale budgets.
14
+
15
+ ## Architecture
16
+
17
+ HobbyLM is a **sparse Mixture-of-Experts (MoE)** transformer (DeepSeek-V3 / Ling-style):
18
+
19
+ | Component | Value |
20
+ |---|---|
21
+ | Total parameters | ~500M (≈ a fraction active per token) |
22
+ | Hidden size / layers | 768 / 16 (1 dense FFN layer, 15 MoE) |
23
+ | Routed experts / active | 36 / top-6 (+ 1 always-on shared expert) |
24
+ | Attention | GQA, 12 query / 3 KV heads, head-dim 128, per-head QK-norm |
25
+ | Router | sigmoid gating, aux-loss-free balancing bias, no top-k renorm |
26
+ | Positional | RoPE |
27
+ | Tokenizer | GPT-2 byte-level BPE (50,304 vocab, sentinel-padded) |
28
+
29
+
30
+ ## Decoding
31
+
32
+ This is a **masked-diffusion** checkpoint (LLaDA-style): generation is iterative bidirectional denoising of `[MASK]` tokens, not left-to-right AR. The GGUF carries `diffusion.*` metadata (mask token id, block size) for a diffusion-aware runtime.
33
+
34
+ ## Files
35
+
36
+ - `model.safetensors` — the model weights (fp32).
37
+ - `config.json` — architecture / hyperparameters.
38
+ - GGUF builds (arch `hobbylm`) live in [`rootxhacker/HobbyLM-gguf`](https://huggingface.co/rootxhacker/HobbyLM-gguf).
39
+
40
+ ## Loading (safetensors)
41
+
42
+ ```python
43
+ import json, torch
44
+ from safetensors.torch import load_file
45
+ sd = load_file("model.safetensors")
46
+ cfg = json.load(open("config.json"))
47
+ # rebuild the HobbyLM nn.Module from `cfg` and `load_state_dict(sd)`.
48
+ ```
49
+
50
+ ## Notes & limitations
51
+
52
+ - Research model at the ~500M scale: fluent but with the capability ceiling of a small model.
53
+ - The GGUF uses a custom `hobbylm` architecture (see the GGUF repo) and needs `moe-rs` or a patched llama.cpp.
54
+
55
+ ## License
56
+
57
+ Apache-2.0.