appleroll commited on
Commit
4268c51
·
verified ·
1 Parent(s): 9b7dc6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -33
README.md CHANGED
@@ -16,41 +16,13 @@ datasets:
16
  library_name: transformers
17
  ---
18
 
19
- # CoBERTa: 24M Parameter Academic Language Model
20
-
21
- <div align="center">
22
- <img src="https://img.shields.io/badge/Parameters-24.5M-blue" alt="24.5M Parameters">
23
- <img src="https://img.shields.io/badge/License-MIT-yellow" alt="MIT License">
24
- </div>
25
 
26
  ## Model Description
27
 
28
- CoBERTa is a **24.5 million parameter** RoBERTa-style masked language model specifically trained on synthetic academic data. It demonstrates that **high-quality synthetic data** can compensate for model scale, enabling domain specialization on consumer hardware.
29
-
30
- It achieves academic language proficiency with 5× fewer parameters than comparable models, trained in around 6 hours on a MacBook Pro.
31
-
32
- ### Model Architecture
33
- - **Type**: Encoder-only transformer
34
- - **Layers**: 12
35
- - **Hidden Size**: 192
36
- - **Attention Heads**: 6
37
- - **Parameters**: 24,506,224
38
- - **Vocabulary**: 35,000 tokens
39
- - **Maximum Sequence Length**: 512 tokens
40
-
41
- ## Training Details
42
-
43
- ### Training Data
44
- - **Source**: 50,000 filtered samples from [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)
45
- - **License**: MIT
46
-
47
- ### Training Procedure
48
- - **Framework**: Apple MLX
49
- - **Hardware**: MacBook Pro M4 (16GB unified memory)
50
- - **Training Time**: 6 hours
51
- - **Batch Size**: 32
52
- - **Learning Rate**: 9e-4 with linear warmup
53
- - **Objective**: Masked language modeling (15% token masking)
54
 
55
  ## Intended uses & limitations
56
 
@@ -87,7 +59,7 @@ for token in top_5_tokens:
87
  If you use CoBERTa in your research, please cite:
88
  ```bibtex
89
  @misc{coberta,
90
- title = {CoBERTa: Training Domain-Expert Language Models on Consumer Hardware},
91
  url = {https://huggingface.co/appleroll/coberta-base},
92
  author = {Zhang, Ethan},
93
  year = {2025}
 
16
  library_name: transformers
17
  ---
18
 
19
+ # CoBERTa: Consumer-friendly Models that Punch Above Weight
 
 
 
 
 
20
 
21
  ## Model Description
22
 
23
+ CoBERTa is a **24.5 million parameter** RoBERTa-style masked language model specifically trained on synthetic academic data. It demonstrates that **high-quality synthetic data** can compensate for model scale, enabling domain specialization, formal logic and reasoning on consumer hardware.
24
+ It achieves academic language proficiency with 5× fewer parameters than comparable models on certain tasks, trained in around 6 hours on consumer hardware.
25
+ It was trained on HuggingfaceTB's Cosmopedia dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Intended uses & limitations
28
 
 
59
  If you use CoBERTa in your research, please cite:
60
  ```bibtex
61
  @misc{coberta,
62
+ title = {CoBERTa: Consumer-friendly Models that Punch Above Weight},
63
  url = {https://huggingface.co/appleroll/coberta-base},
64
  author = {Zhang, Ethan},
65
  year = {2025}