Text Generation
English
Science
Hypothesis
Methodology
Allanatrix commited on
Commit
17bee0c
·
verified ·
1 Parent(s): 30d8664

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -35
README.md CHANGED
@@ -116,7 +116,8 @@ Example:huggingface-cli download your-username/nexamoe-base
116
 
117
  # Usage
118
 
119
- Load a Model:Use the transformers library to load NexaMOE models:
 
120
  from transformers import AutoModelForCausalLM, AutoTokenizer
121
 
122
  model_name = "your-username/nexamoe-base"
@@ -154,44 +155,21 @@ from datasets import load_dataset
154
  dataset = load_dataset("your-username/nexamoe-instruction-data")
155
  lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q", "v"])
156
  model = get_peft_model(model, lora_config)
157
-
158
  # Train with your preferred trainer (e.g., Hugging Face Trainer)
159
 
160
  Run Inference via CLI or GUI:
161
 
162
- Command-Line:python inference.py --model your-username/nexamoe-base --prompt "[PHYS] Hypothesize a new superconductor."
163
-
164
-
165
- Gradio GUI:python app.py
166
 
167
  Opens a web interface to interact with the model.
168
 
169
-
170
- Model Weights and Datasets
171
-
172
- Models:
173
- your-username/nexamoe-base: Baseline NexaMOE (110M parameters).
174
- your-username/nexamoe-cot: NEXA-CoT (110M parameters).
175
- your-username/nexamoe-ultramax: NEXA-Ultramax (2.2B parameters).
176
-
177
-
178
- Datasets:
179
- your-username/nexamoe-instruction-data: 300k instruction-style samples for QLoRA fine-tuning.
180
- your-username/nexamoe-reasoning-data: Reasoning Curriculum Dataset for CoT training.
181
- your-username/nexamoe-long-context-data: Long-Context Corpus for UltraMAX training.
182
-
183
-
184
- # Requirements
185
-
186
- Hardware: NVIDIA GPU with 16-24GB VRAM (e.g., T4, A100) for training/inference. CPU fallback supported for preprocessing.
187
- Software: Python 3.10, PyTorch, Transformers, Accelerate, PEFT, Optuna, Gradio.
188
-
189
  # Performance Metrics
190
 
191
- Extreme Specialization: Modular experts improve response fidelity and interpretability.
192
- Distributed Training: Full hardware saturation stabilizes runtimes and reduces crashes.
193
- Generalizability: Robust across physics, biology, and materials science tasks.
194
- Optimizer Efficiency: AzureSky Optimizer enhances convergence speed and precision.
195
 
196
  See the architecture document for detailed loss curves and metrics.
197
  Similar Models
@@ -200,17 +178,17 @@ Explore related models for inspiration:
200
  Grok (xAI): General-purpose conversational AI with scientific capabilities. Link
201
  LLaMA (Meta AI): Efficient research models for NLP tasks. Link
202
  SciBERT: BERT variant for scientific text processing. Link
203
- Galactica (Meta AI): Scientific language model for paper summarization. Link
204
  BioBERT: BERT variant for biomedical text. Link
205
 
206
  For the models, cite:
207
-
208
  Allanatrix. (2025). NexaMOE Family of Models. Retrieved (6/17/2025)
209
 
210
  Acknowledgements
211
  We thank the scientific and AI communities for advancing Mixture-of-Experts architectures and domain-specific LLMs. Special thanks to the authors of the datasets used (arXiv, PubMed, Materials Project) and the developers of tools like Transformers, PEFT, and Optuna.
212
- For more information, see: https://materialsproject.org/, https://arxiv.org/, https://pubmed.ncbi.nlm.nih.gov/
 
213
  License
214
- MIT License (see LICENSE file for details).
215
 
216
- Have questions or ideas? Open an issue on GitHub or join the discussion on Hugging Face. Happy researching!```
 
116
 
117
  # Usage
118
 
119
+ Load a Model: Use the transformers library to load NexaMOE models:
120
+ ```
121
  from transformers import AutoModelForCausalLM, AutoTokenizer
122
 
123
  model_name = "your-username/nexamoe-base"
 
155
  dataset = load_dataset("your-username/nexamoe-instruction-data")
156
  lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q", "v"])
157
  model = get_peft_model(model, lora_config)
158
+ ```
159
  # Train with your preferred trainer (e.g., Hugging Face Trainer)
160
 
161
  Run Inference via CLI or GUI:
162
 
163
+ "Command-Line: python inference.py --model your-username/nexamoe-base --prompt "[PHYS] Hypothesise a new superconductor."
 
 
 
164
 
165
  Opens a web interface to interact with the model.
166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
  # Performance Metrics
168
 
169
+ Extreme Specialisation: Modular experts improve response fidelity and interpretability.
170
+ Distributed Training: Full hardware saturation stabilises runtimes and reduces crashes.
171
+ Generalisability: Robust across physics, biology, and materials science tasks.
172
+ Optimiser Efficiency: AzureSky Optimiser enhances convergence speed and precision.
173
 
174
  See the architecture document for detailed loss curves and metrics.
175
  Similar Models
 
178
  Grok (xAI): General-purpose conversational AI with scientific capabilities. Link
179
  LLaMA (Meta AI): Efficient research models for NLP tasks. Link
180
  SciBERT: BERT variant for scientific text processing. Link
181
+ Galactica (Meta AI): Scientific language model for paper summarisation. Link
182
  BioBERT: BERT variant for biomedical text. Link
183
 
184
  For the models, cite:
 
185
  Allanatrix. (2025). NexaMOE Family of Models. Retrieved (6/17/2025)
186
 
187
  Acknowledgements
188
  We thank the scientific and AI communities for advancing Mixture-of-Experts architectures and domain-specific LLMs. Special thanks to the authors of the datasets used (arXiv, PubMed, Materials Project) and the developers of tools like Transformers, PEFT, and Optuna.
189
+ For more information, see https://materialsproject.org/, https://arxiv.org/, https://pubmed.ncbi.nlm.nih.gov/
190
+
191
  License
192
+ MIT License (see the LICENSE file for details).
193
 
194
+ Have questions or ideas? Open an issue on GitHub or join the discussion on Hugging Face. Happy researching!