Update README.md
Browse files
README.md
CHANGED
|
@@ -14,9 +14,9 @@ tags:
|
|
| 14 |
- custom_code
|
| 15 |
---
|
| 16 |
|
| 17 |
-
#
|
| 18 |
|
| 19 |
-
**
|
| 20 |
|
| 21 |
This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
|
| 22 |
|
|
@@ -28,7 +28,7 @@ You can use this model directly with the Hugging Face `transformers` library. Si
|
|
| 28 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 29 |
import torch
|
| 30 |
|
| 31 |
-
path = "QuarkML/
|
| 32 |
|
| 33 |
# Load tokenizer and model
|
| 34 |
tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
|
|
@@ -106,12 +106,12 @@ The following example demonstrates the model's generation capabilities after tra
|
|
| 106 |
If you find this model or the associated research useful, please cite:
|
| 107 |
|
| 108 |
```bibtex
|
| 109 |
-
@misc{
|
| 110 |
author = {Quark Machine Learning},
|
| 111 |
-
title = {
|
| 112 |
year = {2025},
|
| 113 |
publisher = {Hugging Face},
|
| 114 |
journal = {Hugging Face Repository},
|
| 115 |
-
howpublished = {\url{https://huggingface.co/QuarkML/
|
| 116 |
}
|
| 117 |
```
|
|
|
|
| 14 |
- custom_code
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# QMoE-400
|
| 18 |
|
| 19 |
+
**QMoE-400** is a 400 million parameter Sparse Mixture of Experts (MoE) model trained on the OpenWebText dataset using JAX/Flax on 8 TPU v3 chips.
|
| 20 |
|
| 21 |
This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
|
| 22 |
|
|
|
|
| 28 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 29 |
import torch
|
| 30 |
|
| 31 |
+
path = "QuarkML/QMoE-400"
|
| 32 |
|
| 33 |
# Load tokenizer and model
|
| 34 |
tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
|
|
|
|
| 106 |
If you find this model or the associated research useful, please cite:
|
| 107 |
|
| 108 |
```bibtex
|
| 109 |
+
@misc{qmoe-400,
|
| 110 |
author = {Quark Machine Learning},
|
| 111 |
+
title = {QMoE-400: A Sparse Mixture of Experts Model},
|
| 112 |
year = {2025},
|
| 113 |
publisher = {Hugging Face},
|
| 114 |
journal = {Hugging Face Repository},
|
| 115 |
+
howpublished = {\url{https://huggingface.co/QuarkML/QMoE-400}}
|
| 116 |
}
|
| 117 |
```
|