Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +10 -10
config.json +1 -1
model.safetensors +2 -2
tokenizer.json +0 -0

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 - rope
 - rmsnorm
 model-index:
-- name: Mythos-172M
   results: []
 widget:
 - text: "The history of artificial intelligence begins with"
@@ -31,7 +31,7 @@ inference:
 <div align="center">
-# Mythos-172M
 **A decoder-only language model built from scratch — LLaMA-compatible weights.**
@@ -44,7 +44,7 @@ inference:
 ---
-> ⚠️ **Research preview.** Debug checkpoint — trained on ~21 M tokens with vocab 3 252 for 5 000 steps. Intended to verify the architecture, not for downstream use. A production 500 M checkpoint will supersede it.
 ## Model Summary
@@ -71,7 +71,7 @@ toolchains — no custom code or `trust_remote_code` required.
 | Component | Choice | Value |
 |---|---|---:|
-| Parameters | — | **172 M** |
 | Hidden layers | Pre-norm decoder blocks | 24 |
 | Hidden size | `d_model` | 768 |
 | Intermediate size | SwiGLU hidden | 2048 |
@@ -82,7 +82,7 @@ toolchains — no custom code or `trust_remote_code` required.
 | Normalization | **RMSNorm** (pre-norm) | ε = 1e-05 |
 | Activation | **SwiGLU** | — |
 | Tied embeddings | Embedding ↔ LM head | ✅ |
-| Vocabulary | ByteLevel BPE | 3,252 |
 | Context length | Max sequence | 2,048 |
 ## Quickstart
@@ -118,27 +118,27 @@ python llama.cpp/convert_hf_to_gguf.py mythos
 ### Data
-- **Corpus:** Wikipedia (English 20231101 snapshot) — 5 000 articles, ~21 M tokens
-- **Tokenizer:** ByteLevel BPE trained from scratch, vocab size **3,252**
 - **Training context:** 512 tokens
 ### Hyperparameters
 | | |
 |---|---:|
-| Steps | 5,000 |
 | Optimizer | AdamW (β₁=0.9, β₂=0.95, wd=0.1) |
 | LR schedule | Cosine decay, 2 000-step warmup |
 | Peak learning rate | 3 × 10⁻⁴ |
 | Precision | bfloat16 mixed |
-| Hardware | Apple M2 (MPS) |
 ## Limitations and Intended Use
 - **Base model only** — no instruction tuning, no RLHF, no safety alignment.
 - English-only; non-English performance is poor.
 - May reproduce biases and factual errors from the training distribution.
-- Tiny vocabulary (3 252 tokens) severely caps fluency — intended as an architecture demo.
 - Not suitable for medical, legal, financial, or other high-stakes applications.
 ## Citation

 - rope
 - rmsnorm
 model-index:
+- name: Mythos-194M
   results: []
 widget:
 - text: "The history of artificial intelligence begins with"
 <div align="center">
+# Mythos-194M
 **A decoder-only language model built from scratch — LLaMA-compatible weights.**
 ---
+> **Production release.** Full pre-training run.
 ## Model Summary
 | Component | Choice | Value |
 |---|---|---:|
+| Parameters | — | **194 M** |
 | Hidden layers | Pre-norm decoder blocks | 24 |
 | Hidden size | `d_model` | 768 |
 | Intermediate size | SwiGLU hidden | 2048 |
 | Normalization | **RMSNorm** (pre-norm) | ε = 1e-05 |
 | Activation | **SwiGLU** | — |
 | Tied embeddings | Embedding ↔ LM head | ✅ |
+| Vocabulary | ByteLevel BPE | 31,021 |
 | Context length | Max sequence | 2,048 |
 ## Quickstart
 ### Data
+- **Corpus:** mixed web + code (details in the GitHub repo)
+- **Tokenizer:** ByteLevel BPE trained from scratch, vocab size **31,021**
 - **Training context:** 512 tokens
 ### Hyperparameters
 | | |
 |---|---:|
+| Steps | 16,000 |
 | Optimizer | AdamW (β₁=0.9, β₂=0.95, wd=0.1) |
 | LR schedule | Cosine decay, 2 000-step warmup |
 | Peak learning rate | 3 × 10⁻⁴ |
 | Precision | bfloat16 mixed |
+| Hardware | A100 40 GB |
 ## Limitations and Intended Use
 - **Base model only** — no instruction tuning, no RLHF, no safety alignment.
 - English-only; non-English performance is poor.
 - May reproduce biases and factual errors from the training distribution.
 - Not suitable for medical, legal, financial, or other high-stakes applications.
 ## Citation

config.json CHANGED Viewed

@@ -9,7 +9,7 @@
   "num_attention_heads": 12,
   "num_key_value_heads": 4,
   "head_dim": 64,
-  "vocab_size": 3252,
   "max_position_embeddings": 2048,
   "rms_norm_eps": 1e-05,
   "rope_theta": 10000.0,

   "num_attention_heads": 12,
   "num_key_value_heads": 4,
   "head_dim": 64,
+  "vocab_size": 31021,
   "max_position_embeddings": 2048,
   "rms_norm_eps": 1e-05,
   "rope_theta": 10000.0,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ff47b104ec6bed36b144db50483986275af0c4bf946d1edc824bff50270a653
-size 614144704

 version https://git-lfs.github.com/spec/v1
+oid sha256:e53a1840fddf1373dac13b2c3745b50a4a3ca5fcba7e668984081f5a7a5c4e0a
+size 699451136

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff