Nepalaya-R
Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.
About This Model
This repository contains the Nepalaya-R model implementation with:
- ✅ Full source code and inference implementations
- ✅ Tokenizer configuration adapted for Nepalaya-R
- ✅ Easy-to-use inference scripts
- ✅ Documentation and setup guides
Quick Start
Installation
pip install -r requirements.txt
Download & Setup
Option 1: Download from Hugging Face
export HF_TOKEN=your_token
python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
Option 2: Run Quick Inference
python quick_inference.py --prompt "Your prompt here"
Mirror Setup
To create your own Nepalaya-R repo mirror:
export HF_TOKEN=your_token
python mirror_to_hf.py \
--source source-org/source-model \
--dest your-username/Nepalaya-R
Documentation
- SETUP.md - Detailed setup and configuration guide
- GITHUB_DEPLOY.md - Deployment instructions
- inference/README.md - Inference code documentation
Model Architecture
Nepalaya-R architecture summary:
- Parameters: 671B
- Context Length: Extended via sparse attention
- Training: Sparse attention based training pipeline
- Architecture: Optimized transformer with mixture-of-experts
Key Features
- Multi-expert routing for efficient inference
- Sparse attention for long-context processing
- Chat template support
- Distributed inference capabilities
System Requirements
- GPU Memory: 48GB+ VRAM recommended
- RAM: 64GB+ system memory
- Storage: ~300GB for full model weights
- SSD: Fast storage recommended
Usage Examples
Basic Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"your-username/Nepalaya-R",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")
inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Chat Mode
messages = [
{"role": "user", "content": "What is machine learning?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
Repository Structure
Nepalaya-R/
├── README.md # This file
├── SETUP.md # Setup guide
├── GITHUB_DEPLOY.md # Deployment guide
├── requirements.txt # Python dependencies
├── config.json # Model configuration
├── tokenizer.json # Tokenizer
├── quick_inference.py # Quick inference script
├── download_model.py # Model downloader
├── mirror_to_hf.py # HF mirroring tool
├── inference/ # Inference code
│ ├── generate.py # Generation script
│ ├── model.py # Model implementation
│ ├── convert.py # Weight converter
│ └── config_671B_nepalaya.json # Inference config
└── assets/ # Chat templates
Files Included
- Source Code: Full inference implementation
- Configuration: Model and generation configs
- Tokenizer: Complete tokenizer setup
- Documentation: Setup and usage guides
- Utilities: Download and mirror scripts
License
MIT License - See LICENSE file
Support
For documentation, see SETUP.md For deployment, see GITHUB_DEPLOY.md
Nepalaya-R model card and repository maintained by the Nepalaya-R project.
- Downloads last month
- 252