appleroll
/

coberta-base

academic-language

resource-efficient

consumer-hardware

Model card Files Files and versions

appleroll commited on Jan 18

Commit

4268c51

·

verified ·

1 Parent(s): 9b7dc6b

Update README.md

Files changed (1) hide show

README.md +5 -33

README.md CHANGED Viewed

@@ -16,41 +16,13 @@ datasets:
 library_name: transformers
 ---
-# CoBERTa: 24M Parameter Academic Language Model
-<div align="center">
-  <img src="https://img.shields.io/badge/Parameters-24.5M-blue" alt="24.5M Parameters">
-  <img src="https://img.shields.io/badge/License-MIT-yellow" alt="MIT License">
-</div>
 ## Model Description
-CoBERTa is a **24.5 million parameter** RoBERTa-style masked language model specifically trained on synthetic academic data. It demonstrates that **high-quality synthetic data** can compensate for model scale, enabling domain specialization on consumer hardware.
-It achieves academic language proficiency with 5× fewer parameters than comparable models, trained in around 6 hours on a MacBook Pro.
-### Model Architecture
-- **Type**: Encoder-only transformer
-- **Layers**: 12
-- **Hidden Size**: 192
-- **Attention Heads**: 6
-- **Parameters**: 24,506,224
-- **Vocabulary**: 35,000 tokens
-- **Maximum Sequence Length**: 512 tokens
-## Training Details
-### Training Data
-- **Source**: 50,000 filtered samples from [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)
-- **License**: MIT
-### Training Procedure
-- **Framework**: Apple MLX
-- **Hardware**: MacBook Pro M4 (16GB unified memory)
-- **Training Time**: 6 hours
-- **Batch Size**: 32
-- **Learning Rate**: 9e-4 with linear warmup
-- **Objective**: Masked language modeling (15% token masking)
 ## Intended uses & limitations
@@ -87,7 +59,7 @@ for token in top_5_tokens:
 If you use CoBERTa in your research, please cite:
 ```bibtex
 @misc{coberta,
-    title     = {CoBERTa: Training Domain-Expert Language Models on Consumer Hardware},
     url       = {https://huggingface.co/appleroll/coberta-base},
     author    = {Zhang, Ethan},
     year      = {2025}

 library_name: transformers
 ---
+# CoBERTa: Consumer-friendly Models that Punch Above Weight
 ## Model Description
+CoBERTa is a **24.5 million parameter** RoBERTa-style masked language model specifically trained on synthetic academic data. It demonstrates that **high-quality synthetic data** can compensate for model scale, enabling domain specialization, formal logic and reasoning on consumer hardware.
+It achieves academic language proficiency with 5× fewer parameters than comparable models on certain tasks, trained in around 6 hours on consumer hardware.
+It was trained on HuggingfaceTB's Cosmopedia dataset.
 ## Intended uses & limitations
 If you use CoBERTa in your research, please cite:
 ```bibtex
 @misc{coberta,
+    title     = {CoBERTa: Consumer-friendly Models that Punch Above Weight},
     url       = {https://huggingface.co/appleroll/coberta-base},
     author    = {Zhang, Ethan},
     year      = {2025}