MGow commited on
Commit
aade015
·
verified ·
1 Parent(s): 6140b37

Update README.md

Browse files

Minor fixes in data accuracy.

Files changed (1) hide show
  1. README.md +5 -22
README.md CHANGED
@@ -18,7 +18,7 @@ spaces:
18
 
19
  # PicoChat
20
 
21
- **PicoChat** is a 335M parameter language model trained entirely from scratch on a MacBook Air M2 (16GB RAM) in approximately 7 days.
22
  It serves as a "lab notebook" proof-of-concept for training capable small language models (SLMs) on consumer hardware using pure PyTorch and MPS (Metal Performance Shaders).
23
 
24
  > **Links:**
@@ -47,10 +47,10 @@ It serves as a "lab notebook" proof-of-concept for training capable small langua
47
 
48
  The model was trained in three phases using the [nanochat](https://github.com/karpathy/nanochat) framework, adapted for macOS:
49
 
50
- 1. **Base Pretraining (~6 days):**
51
  - **Data:** [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (100B subset, shuffled)
52
- - **Steps:** ~48,000
53
- - **Tokens:** ~344M
54
  - **Objective:** Next token prediction
55
 
56
  2. **Midtraining (~16 hours):**
@@ -75,26 +75,9 @@ The model was trained in three phases using the [nanochat](https://github.com/ka
75
  - **Context Window:** Limited to 1024 tokens.
76
  - **Safety:** The model has not gone through extensive safety alignment or RLHF. It generally behaves like a base model with some instruction following capabilities.
77
 
78
- ## Evaluation
79
-
80
- | Metric | Score | Note |
81
- | :--- | :--- | :--- |
82
- | **MMLU** | 26.8% | Near random baseline (25%) |
83
- | **ARC-Easy** | 25.2% | Near random baseline (25%) |
84
-
85
- *Note: This is a small ~300M model trained on <1B tokens. It is not expected to achieve high benchmarks but demonstrates end-to-end coherence.*
86
-
87
- ## Compute & Efficiency
88
-
89
- - **Hardware:** MacBook Air M2 (2022)
90
- - **RAM:** 16 GB Unified Memory
91
- - **Power Consumption:** ~35W peak
92
- - **Total Energy:** ~5 kWh
93
- - **Throughput:** ~1500-2000 tokens/sec (varying with thermal throttling)
94
-
95
  ## Usage
96
 
97
- This model requires the `nanochat` library to run, as it uses a custom architecture implementation optimized for educational clarity and hackability.
98
 
99
  ## License
100
 
 
18
 
19
  # PicoChat
20
 
21
+ **PicoChat** is a 335M parameter language model trained entirely from scratch on a MacBook Air M2 (16GB RAM) in approximately 6 days.
22
  It serves as a "lab notebook" proof-of-concept for training capable small language models (SLMs) on consumer hardware using pure PyTorch and MPS (Metal Performance Shaders).
23
 
24
  > **Links:**
 
47
 
48
  The model was trained in three phases using the [nanochat](https://github.com/karpathy/nanochat) framework, adapted for macOS:
49
 
50
+ 1. **Base Pretraining (~5 days):**
51
  - **Data:** [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (100B subset, shuffled)
52
+ - **Steps:** ~60,000
53
+ - **Tokens:** ~442M
54
  - **Objective:** Next token prediction
55
 
56
  2. **Midtraining (~16 hours):**
 
75
  - **Context Window:** Limited to 1024 tokens.
76
  - **Safety:** The model has not gone through extensive safety alignment or RLHF. It generally behaves like a base model with some instruction following capabilities.
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ## Usage
79
 
80
+ This model requires the [nanochat](https://github.com/MichalGow/PicoChat) library to run, as it uses a custom architecture implementation optimized for educational clarity and hackability.
81
 
82
  ## License
83