MGow
/

PicoChat

@@ -18,7 +18,7 @@ spaces:
 # PicoChat
-**PicoChat** is a 335M parameter language model trained entirely from scratch on a MacBook Air M2 (16GB RAM) in approximately 7 days.
 It serves as a "lab notebook" proof-of-concept for training capable small language models (SLMs) on consumer hardware using pure PyTorch and MPS (Metal Performance Shaders).
 > **Links:**
@@ -47,10 +47,10 @@ It serves as a "lab notebook" proof-of-concept for training capable small langua
 The model was trained in three phases using the [nanochat](https://github.com/karpathy/nanochat) framework, adapted for macOS:
-1.  **Base Pretraining (~6 days):**
     -   **Data:** [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (100B subset, shuffled)
-    -   **Steps:** ~48,000
-    -   **Tokens:** ~344M
     -   **Objective:** Next token prediction
 2.  **Midtraining (~16 hours):**
@@ -75,26 +75,9 @@ The model was trained in three phases using the [nanochat](https://github.com/ka
 - **Context Window:** Limited to 1024 tokens.
 - **Safety:** The model has not gone through extensive safety alignment or RLHF. It generally behaves like a base model with some instruction following capabilities.
-## Evaluation
-| Metric | Score | Note |
-| :--- | :--- | :--- |
-| **MMLU** | 26.8% | Near random baseline (25%) |
-| **ARC-Easy** | 25.2% | Near random baseline (25%) |
-*Note: This is a small ~300M model trained on <1B tokens. It is not expected to achieve high benchmarks but demonstrates end-to-end coherence.*
-## Compute & Efficiency
-- **Hardware:** MacBook Air M2 (2022)
-- **RAM:** 16 GB Unified Memory
-- **Power Consumption:** ~35W peak
-- **Total Energy:** ~5 kWh
-- **Throughput:** ~1500-2000 tokens/sec (varying with thermal throttling)
 ## Usage
-This model requires the `nanochat` library to run, as it uses a custom architecture implementation optimized for educational clarity and hackability.
 ## License

 # PicoChat
+**PicoChat** is a 335M parameter language model trained entirely from scratch on a MacBook Air M2 (16GB RAM) in approximately 6 days.
 It serves as a "lab notebook" proof-of-concept for training capable small language models (SLMs) on consumer hardware using pure PyTorch and MPS (Metal Performance Shaders).
 > **Links:**
 The model was trained in three phases using the [nanochat](https://github.com/karpathy/nanochat) framework, adapted for macOS:
+1.  **Base Pretraining (~5 days):**
     -   **Data:** [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (100B subset, shuffled)
+    -   **Steps:** ~60,000
+    -   **Tokens:** ~442M
     -   **Objective:** Next token prediction
 2.  **Midtraining (~16 hours):**
 - **Context Window:** Limited to 1024 tokens.
 - **Safety:** The model has not gone through extensive safety alignment or RLHF. It generally behaves like a base model with some instruction following capabilities.
 ## Usage
+This model requires the [nanochat](https://github.com/MichalGow/PicoChat) library to run, as it uses a custom architecture implementation optimized for educational clarity and hackability.
 ## License