QuarkML
/

QMoE-400

Text Generation

Mixture of Experts

sparse-mixture-of-experts

Model card Files Files and versions

Sidharthan commited on 25 days ago

Commit

7e0d858

·

verified ·

1 Parent(s): cc79560

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -14,9 +14,9 @@ tags:
 - custom_code
 ---
-# Q-MoE-400
-**Q-MoE-400** is a 400 million parameter Sparse Mixture of Experts (MoE) model trained on the OpenWebText dataset using JAX/Flax on 8 TPU v3 chips.
 This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
@@ -28,7 +28,7 @@ You can use this model directly with the Hugging Face `transformers` library. Si
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-path = "QuarkML/Q-MoE-400"
 # Load tokenizer and model
 tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
@@ -106,12 +106,12 @@ The following example demonstrates the model's generation capabilities after tra
 If you find this model or the associated research useful, please cite:
 ```bibtex
-@misc{q-moe-400,
   author = {Quark Machine Learning},
-  title = {Q-MoE-400: A Sparse Mixture of Experts Model},
   year = {2025},
   publisher = {Hugging Face},
   journal = {Hugging Face Repository},
-  howpublished = {\url{https://huggingface.co/QuarkML/Q-MoE-400}}
 }
 ```

 - custom_code
 ---
+# QMoE-400
+**QMoE-400** is a 400 million parameter Sparse Mixture of Experts (MoE) model trained on the OpenWebText dataset using JAX/Flax on 8 TPU v3 chips.
 This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
+path = "QuarkML/QMoE-400"
 # Load tokenizer and model
 tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
 If you find this model or the associated research useful, please cite:
 ```bibtex
+@misc{qmoe-400,
   author = {Quark Machine Learning},
+  title = {QMoE-400: A Sparse Mixture of Experts Model},
   year = {2025},
   publisher = {Hugging Face},
   journal = {Hugging Face Repository},
+  howpublished = {\url{https://huggingface.co/QuarkML/QMoE-400}}
 }
 ```