AethronPhantom
/

NexaSci

@@ -116,7 +116,8 @@ Example:huggingface-cli download your-username/nexamoe-base
 # Usage
-Load a Model:Use the transformers library to load NexaMOE models:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "your-username/nexamoe-base"
@@ -154,44 +155,21 @@ from datasets import load_dataset
 dataset = load_dataset("your-username/nexamoe-instruction-data")
 lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q", "v"])
 model = get_peft_model(model, lora_config)
 # Train with your preferred trainer (e.g., Hugging Face Trainer)
 Run Inference via CLI or GUI:
-Command-Line:python inference.py --model your-username/nexamoe-base --prompt "[PHYS] Hypothesize a new superconductor."
-Gradio GUI:python app.py
 Opens a web interface to interact with the model.
-Model Weights and Datasets
-Models:
-your-username/nexamoe-base: Baseline NexaMOE (110M parameters).
-your-username/nexamoe-cot: NEXA-CoT (110M parameters).
-your-username/nexamoe-ultramax: NEXA-Ultramax (2.2B parameters).
-Datasets:
-your-username/nexamoe-instruction-data: 300k instruction-style samples for QLoRA fine-tuning.
-your-username/nexamoe-reasoning-data: Reasoning Curriculum Dataset for CoT training.
-your-username/nexamoe-long-context-data: Long-Context Corpus for UltraMAX training.
-# Requirements
-Hardware: NVIDIA GPU with 16-24GB VRAM (e.g., T4, A100) for training/inference. CPU fallback supported for preprocessing.
-Software: Python 3.10, PyTorch, Transformers, Accelerate, PEFT, Optuna, Gradio.
 # Performance Metrics
-Extreme Specialization: Modular experts improve response fidelity and interpretability.
-Distributed Training: Full hardware saturation stabilizes runtimes and reduces crashes.
-Generalizability: Robust across physics, biology, and materials science tasks.
-Optimizer Efficiency: AzureSky Optimizer enhances convergence speed and precision.
 See the architecture document for detailed loss curves and metrics.
 Similar Models
@@ -200,17 +178,17 @@ Explore related models for inspiration:
 Grok (xAI): General-purpose conversational AI with scientific capabilities. Link
 LLaMA (Meta AI): Efficient research models for NLP tasks. Link
 SciBERT: BERT variant for scientific text processing. Link
-Galactica (Meta AI): Scientific language model for paper summarization. Link
 BioBERT: BERT variant for biomedical text. Link
 For the models, cite:
 Allanatrix. (2025). NexaMOE Family of Models. Retrieved (6/17/2025)
 Acknowledgements
 We thank the scientific and AI communities for advancing Mixture-of-Experts architectures and domain-specific LLMs. Special thanks to the authors of the datasets used (arXiv, PubMed, Materials Project) and the developers of tools like Transformers, PEFT, and Optuna.
-For more information, see: https://materialsproject.org/, https://arxiv.org/, https://pubmed.ncbi.nlm.nih.gov/
 License
-MIT License (see LICENSE file for details).
-Have questions or ideas? Open an issue on GitHub or join the discussion on Hugging Face. Happy researching!```

 # Usage
+Load a Model: Use the transformers library to load NexaMOE models:
+```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "your-username/nexamoe-base"
 dataset = load_dataset("your-username/nexamoe-instruction-data")
 lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["q", "v"])
 model = get_peft_model(model, lora_config)
+```
 # Train with your preferred trainer (e.g., Hugging Face Trainer)
 Run Inference via CLI or GUI:
+"Command-Line: python inference.py --model your-username/nexamoe-base --prompt "[PHYS] Hypothesise a new superconductor."
 Opens a web interface to interact with the model.
 # Performance Metrics
+Extreme Specialisation: Modular experts improve response fidelity and interpretability.
+Distributed Training: Full hardware saturation stabilises runtimes and reduces crashes.
+Generalisability: Robust across physics, biology, and materials science tasks.
+Optimiser Efficiency: AzureSky Optimiser enhances convergence speed and precision.
 See the architecture document for detailed loss curves and metrics.
 Similar Models
 Grok (xAI): General-purpose conversational AI with scientific capabilities. Link
 LLaMA (Meta AI): Efficient research models for NLP tasks. Link
 SciBERT: BERT variant for scientific text processing. Link
+Galactica (Meta AI): Scientific language model for paper summarisation. Link
 BioBERT: BERT variant for biomedical text. Link
 For the models, cite:
 Allanatrix. (2025). NexaMOE Family of Models. Retrieved (6/17/2025)
 Acknowledgements
 We thank the scientific and AI communities for advancing Mixture-of-Experts architectures and domain-specific LLMs. Special thanks to the authors of the datasets used (arXiv, PubMed, Materials Project) and the developers of tools like Transformers, PEFT, and Optuna.
+For more information, see https://materialsproject.org/, https://arxiv.org/, https://pubmed.ncbi.nlm.nih.gov/
 License
+MIT License (see the LICENSE file for details).
+Have questions or ideas? Open an issue on GitHub or join the discussion on Hugging Face. Happy researching!