Sidharthan commited on
Commit
7e0d858
·
verified ·
1 Parent(s): cc79560

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -14,9 +14,9 @@ tags:
14
  - custom_code
15
  ---
16
 
17
- # Q-MoE-400
18
 
19
- **Q-MoE-400** is a 400 million parameter Sparse Mixture of Experts (MoE) model trained on the OpenWebText dataset using JAX/Flax on 8 TPU v3 chips.
20
 
21
  This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
22
 
@@ -28,7 +28,7 @@ You can use this model directly with the Hugging Face `transformers` library. Si
28
  from transformers import AutoTokenizer, AutoModelForCausalLM
29
  import torch
30
 
31
- path = "QuarkML/Q-MoE-400"
32
 
33
  # Load tokenizer and model
34
  tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
@@ -106,12 +106,12 @@ The following example demonstrates the model's generation capabilities after tra
106
  If you find this model or the associated research useful, please cite:
107
 
108
  ```bibtex
109
- @misc{q-moe-400,
110
  author = {Quark Machine Learning},
111
- title = {Q-MoE-400: A Sparse Mixture of Experts Model},
112
  year = {2025},
113
  publisher = {Hugging Face},
114
  journal = {Hugging Face Repository},
115
- howpublished = {\url{https://huggingface.co/QuarkML/Q-MoE-400}}
116
  }
117
  ```
 
14
  - custom_code
15
  ---
16
 
17
+ # QMoE-400
18
 
19
+ **QMoE-400** is a 400 million parameter Sparse Mixture of Experts (MoE) model trained on the OpenWebText dataset using JAX/Flax on 8 TPU v3 chips.
20
 
21
  This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
22
 
 
28
  from transformers import AutoTokenizer, AutoModelForCausalLM
29
  import torch
30
 
31
+ path = "QuarkML/QMoE-400"
32
 
33
  # Load tokenizer and model
34
  tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
 
106
  If you find this model or the associated research useful, please cite:
107
 
108
  ```bibtex
109
+ @misc{qmoe-400,
110
  author = {Quark Machine Learning},
111
+ title = {QMoE-400: A Sparse Mixture of Experts Model},
112
  year = {2025},
113
  publisher = {Hugging Face},
114
  journal = {Hugging Face Repository},
115
+ howpublished = {\url{https://huggingface.co/QuarkML/QMoE-400}}
116
  }
117
  ```