Update README.md
Browse filesMinor fixes in data accuracy.
README.md
CHANGED
|
@@ -18,7 +18,7 @@ spaces:
|
|
| 18 |
|
| 19 |
# PicoChat
|
| 20 |
|
| 21 |
-
**PicoChat** is a 335M parameter language model trained entirely from scratch on a MacBook Air M2 (16GB RAM) in approximately
|
| 22 |
It serves as a "lab notebook" proof-of-concept for training capable small language models (SLMs) on consumer hardware using pure PyTorch and MPS (Metal Performance Shaders).
|
| 23 |
|
| 24 |
> **Links:**
|
|
@@ -47,10 +47,10 @@ It serves as a "lab notebook" proof-of-concept for training capable small langua
|
|
| 47 |
|
| 48 |
The model was trained in three phases using the [nanochat](https://github.com/karpathy/nanochat) framework, adapted for macOS:
|
| 49 |
|
| 50 |
-
1. **Base Pretraining (~
|
| 51 |
- **Data:** [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (100B subset, shuffled)
|
| 52 |
-
- **Steps:** ~
|
| 53 |
-
- **Tokens:** ~
|
| 54 |
- **Objective:** Next token prediction
|
| 55 |
|
| 56 |
2. **Midtraining (~16 hours):**
|
|
@@ -75,26 +75,9 @@ The model was trained in three phases using the [nanochat](https://github.com/ka
|
|
| 75 |
- **Context Window:** Limited to 1024 tokens.
|
| 76 |
- **Safety:** The model has not gone through extensive safety alignment or RLHF. It generally behaves like a base model with some instruction following capabilities.
|
| 77 |
|
| 78 |
-
## Evaluation
|
| 79 |
-
|
| 80 |
-
| Metric | Score | Note |
|
| 81 |
-
| :--- | :--- | :--- |
|
| 82 |
-
| **MMLU** | 26.8% | Near random baseline (25%) |
|
| 83 |
-
| **ARC-Easy** | 25.2% | Near random baseline (25%) |
|
| 84 |
-
|
| 85 |
-
*Note: This is a small ~300M model trained on <1B tokens. It is not expected to achieve high benchmarks but demonstrates end-to-end coherence.*
|
| 86 |
-
|
| 87 |
-
## Compute & Efficiency
|
| 88 |
-
|
| 89 |
-
- **Hardware:** MacBook Air M2 (2022)
|
| 90 |
-
- **RAM:** 16 GB Unified Memory
|
| 91 |
-
- **Power Consumption:** ~35W peak
|
| 92 |
-
- **Total Energy:** ~5 kWh
|
| 93 |
-
- **Throughput:** ~1500-2000 tokens/sec (varying with thermal throttling)
|
| 94 |
-
|
| 95 |
## Usage
|
| 96 |
|
| 97 |
-
This model requires the
|
| 98 |
|
| 99 |
## License
|
| 100 |
|
|
|
|
| 18 |
|
| 19 |
# PicoChat
|
| 20 |
|
| 21 |
+
**PicoChat** is a 335M parameter language model trained entirely from scratch on a MacBook Air M2 (16GB RAM) in approximately 6 days.
|
| 22 |
It serves as a "lab notebook" proof-of-concept for training capable small language models (SLMs) on consumer hardware using pure PyTorch and MPS (Metal Performance Shaders).
|
| 23 |
|
| 24 |
> **Links:**
|
|
|
|
| 47 |
|
| 48 |
The model was trained in three phases using the [nanochat](https://github.com/karpathy/nanochat) framework, adapted for macOS:
|
| 49 |
|
| 50 |
+
1. **Base Pretraining (~5 days):**
|
| 51 |
- **Data:** [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (100B subset, shuffled)
|
| 52 |
+
- **Steps:** ~60,000
|
| 53 |
+
- **Tokens:** ~442M
|
| 54 |
- **Objective:** Next token prediction
|
| 55 |
|
| 56 |
2. **Midtraining (~16 hours):**
|
|
|
|
| 75 |
- **Context Window:** Limited to 1024 tokens.
|
| 76 |
- **Safety:** The model has not gone through extensive safety alignment or RLHF. It generally behaves like a base model with some instruction following capabilities.
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
## Usage
|
| 79 |
|
| 80 |
+
This model requires the [nanochat](https://github.com/MichalGow/PicoChat) library to run, as it uses a custom architecture implementation optimized for educational clarity and hackability.
|
| 81 |
|
| 82 |
## License
|
| 83 |
|