Nepalaya-R / README.md
oncody's picture
Set Nepalaya-R model card branding
8f2bfa9 verified
---
license: mit
library_name: transformers
---
# Nepalaya-R
Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.
## About This Model
This repository contains the Nepalaya-R model implementation with:
- ✅ Full source code and inference implementations
- ✅ Tokenizer configuration adapted for Nepalaya-R
- ✅ Easy-to-use inference scripts
- ✅ Documentation and setup guides
## Quick Start
### Installation
```bash
pip install -r requirements.txt
```
### Download & Setup
Option 1: Download from Hugging Face
```bash
export HF_TOKEN=your_token
python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
```
Option 2: Run Quick Inference
```bash
python quick_inference.py --prompt "Your prompt here"
```
### Mirror Setup
To create your own Nepalaya-R repo mirror:
```bash
export HF_TOKEN=your_token
python mirror_to_hf.py \
--source source-org/source-model \
--dest your-username/Nepalaya-R
```
## Documentation
- **[SETUP.md](SETUP.md)** - Detailed setup and configuration guide
- **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Deployment instructions
- **[inference/README.md](inference/README.md)** - Inference code documentation
## Model Architecture
Nepalaya-R architecture summary:
- **Parameters:** 671B
- **Context Length:** Extended via sparse attention
- **Training:** Sparse attention based training pipeline
- **Architecture:** Optimized transformer with mixture-of-experts
## Key Features
- Multi-expert routing for efficient inference
- Sparse attention for long-context processing
- Chat template support
- Distributed inference capabilities
## System Requirements
- **GPU Memory:** 48GB+ VRAM recommended
- **RAM:** 64GB+ system memory
- **Storage:** ~300GB for full model weights
- **SSD:** Fast storage recommended
## Usage Examples
### Basic Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"your-username/Nepalaya-R",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")
inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```
### Chat Mode
```python
messages = [
{"role": "user", "content": "What is machine learning?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
```
## Repository Structure
```
Nepalaya-R/
├── README.md # This file
├── SETUP.md # Setup guide
├── GITHUB_DEPLOY.md # Deployment guide
├── requirements.txt # Python dependencies
├── config.json # Model configuration
├── tokenizer.json # Tokenizer
├── quick_inference.py # Quick inference script
├── download_model.py # Model downloader
├── mirror_to_hf.py # HF mirroring tool
├── inference/ # Inference code
│ ├── generate.py # Generation script
│ ├── model.py # Model implementation
│ ├── convert.py # Weight converter
│ └── config_671B_nepalaya.json # Inference config
└── assets/ # Chat templates
```
## Files Included
- **Source Code:** Full inference implementation
- **Configuration:** Model and generation configs
- **Tokenizer:** Complete tokenizer setup
- **Documentation:** Setup and usage guides
- **Utilities:** Download and mirror scripts
## License
MIT License - See [LICENSE](LICENSE) file
## Support
For documentation, see [SETUP.md](SETUP.md)
For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)
---
Nepalaya-R model card and repository maintained by the Nepalaya-R project.