Update README.md
Browse files
README.md
CHANGED
|
@@ -116,7 +116,8 @@ Example:huggingface-cli download your-username/nexamoe-base
|
|
| 116 |
|
| 117 |
# Usage
|
| 118 |
|
| 119 |
-
Load a Model:Use the transformers library to load NexaMOE models:
|
|
|
|
| 120 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 121 |
|
| 122 |
model_name = "your-username/nexamoe-base"
|
|
@@ -154,44 +155,21 @@ from datasets import load_dataset
|
|
| 154 |
dataset = load_dataset("your-username/nexamoe-instruction-data")
|
| 155 |
lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q", "v"])
|
| 156 |
model = get_peft_model(model, lora_config)
|
| 157 |
-
|
| 158 |
# Train with your preferred trainer (e.g., Hugging Face Trainer)
|
| 159 |
|
| 160 |
Run Inference via CLI or GUI:
|
| 161 |
|
| 162 |
-
Command-Line:python inference.py --model your-username/nexamoe-base --prompt "[PHYS]
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
Gradio GUI:python app.py
|
| 166 |
|
| 167 |
Opens a web interface to interact with the model.
|
| 168 |
|
| 169 |
-
|
| 170 |
-
Model Weights and Datasets
|
| 171 |
-
|
| 172 |
-
Models:
|
| 173 |
-
your-username/nexamoe-base: Baseline NexaMOE (110M parameters).
|
| 174 |
-
your-username/nexamoe-cot: NEXA-CoT (110M parameters).
|
| 175 |
-
your-username/nexamoe-ultramax: NEXA-Ultramax (2.2B parameters).
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
Datasets:
|
| 179 |
-
your-username/nexamoe-instruction-data: 300k instruction-style samples for QLoRA fine-tuning.
|
| 180 |
-
your-username/nexamoe-reasoning-data: Reasoning Curriculum Dataset for CoT training.
|
| 181 |
-
your-username/nexamoe-long-context-data: Long-Context Corpus for UltraMAX training.
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
# Requirements
|
| 185 |
-
|
| 186 |
-
Hardware: NVIDIA GPU with 16-24GB VRAM (e.g., T4, A100) for training/inference. CPU fallback supported for preprocessing.
|
| 187 |
-
Software: Python 3.10, PyTorch, Transformers, Accelerate, PEFT, Optuna, Gradio.
|
| 188 |
-
|
| 189 |
# Performance Metrics
|
| 190 |
|
| 191 |
-
Extreme
|
| 192 |
-
Distributed Training: Full hardware saturation
|
| 193 |
-
|
| 194 |
-
|
| 195 |
|
| 196 |
See the architecture document for detailed loss curves and metrics.
|
| 197 |
Similar Models
|
|
@@ -200,17 +178,17 @@ Explore related models for inspiration:
|
|
| 200 |
Grok (xAI): General-purpose conversational AI with scientific capabilities. Link
|
| 201 |
LLaMA (Meta AI): Efficient research models for NLP tasks. Link
|
| 202 |
SciBERT: BERT variant for scientific text processing. Link
|
| 203 |
-
Galactica (Meta AI): Scientific language model for paper
|
| 204 |
BioBERT: BERT variant for biomedical text. Link
|
| 205 |
|
| 206 |
For the models, cite:
|
| 207 |
-
|
| 208 |
Allanatrix. (2025). NexaMOE Family of Models. Retrieved (6/17/2025)
|
| 209 |
|
| 210 |
Acknowledgements
|
| 211 |
We thank the scientific and AI communities for advancing Mixture-of-Experts architectures and domain-specific LLMs. Special thanks to the authors of the datasets used (arXiv, PubMed, Materials Project) and the developers of tools like Transformers, PEFT, and Optuna.
|
| 212 |
-
For more information, see
|
|
|
|
| 213 |
License
|
| 214 |
-
MIT License (see LICENSE file for details).
|
| 215 |
|
| 216 |
-
Have questions or ideas? Open an issue on GitHub or join the discussion on Hugging Face. Happy researching!
|
|
|
|
| 116 |
|
| 117 |
# Usage
|
| 118 |
|
| 119 |
+
Load a Model: Use the transformers library to load NexaMOE models:
|
| 120 |
+
```
|
| 121 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 122 |
|
| 123 |
model_name = "your-username/nexamoe-base"
|
|
|
|
| 155 |
dataset = load_dataset("your-username/nexamoe-instruction-data")
|
| 156 |
lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q", "v"])
|
| 157 |
model = get_peft_model(model, lora_config)
|
| 158 |
+
```
|
| 159 |
# Train with your preferred trainer (e.g., Hugging Face Trainer)
|
| 160 |
|
| 161 |
Run Inference via CLI or GUI:
|
| 162 |
|
| 163 |
+
"Command-Line: python inference.py --model your-username/nexamoe-base --prompt "[PHYS] Hypothesise a new superconductor."
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
Opens a web interface to interact with the model.
|
| 166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
# Performance Metrics
|
| 168 |
|
| 169 |
+
Extreme Specialisation: Modular experts improve response fidelity and interpretability.
|
| 170 |
+
Distributed Training: Full hardware saturation stabilises runtimes and reduces crashes.
|
| 171 |
+
Generalisability: Robust across physics, biology, and materials science tasks.
|
| 172 |
+
Optimiser Efficiency: AzureSky Optimiser enhances convergence speed and precision.
|
| 173 |
|
| 174 |
See the architecture document for detailed loss curves and metrics.
|
| 175 |
Similar Models
|
|
|
|
| 178 |
Grok (xAI): General-purpose conversational AI with scientific capabilities. Link
|
| 179 |
LLaMA (Meta AI): Efficient research models for NLP tasks. Link
|
| 180 |
SciBERT: BERT variant for scientific text processing. Link
|
| 181 |
+
Galactica (Meta AI): Scientific language model for paper summarisation. Link
|
| 182 |
BioBERT: BERT variant for biomedical text. Link
|
| 183 |
|
| 184 |
For the models, cite:
|
|
|
|
| 185 |
Allanatrix. (2025). NexaMOE Family of Models. Retrieved (6/17/2025)
|
| 186 |
|
| 187 |
Acknowledgements
|
| 188 |
We thank the scientific and AI communities for advancing Mixture-of-Experts architectures and domain-specific LLMs. Special thanks to the authors of the datasets used (arXiv, PubMed, Materials Project) and the developers of tools like Transformers, PEFT, and Optuna.
|
| 189 |
+
For more information, see https://materialsproject.org/, https://arxiv.org/, https://pubmed.ncbi.nlm.nih.gov/
|
| 190 |
+
|
| 191 |
License
|
| 192 |
+
MIT License (see the LICENSE file for details).
|
| 193 |
|
| 194 |
+
Have questions or ideas? Open an issue on GitHub or join the discussion on Hugging Face. Happy researching!
|