---
license: mit
library_name: transformers
---

# Nepalaya-R

Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.

## About This Model

This repository contains the Nepalaya-R model implementation with:

- ✅ Full source code and inference implementations
- ✅ Tokenizer configuration adapted for Nepalaya-R
- ✅ Easy-to-use inference scripts
- ✅ Documentation and setup guides

## Quick Start

### Installation

```bash
pip install -r requirements.txt
```

### Download & Setup

Option 1: Download from Hugging Face
```bash
export HF_TOKEN=your_token
python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
```

Option 2: Run Quick Inference
```bash
python quick_inference.py --prompt "Your prompt here"
```

### Mirror Setup

To create your own Nepalaya-R repo mirror:
```bash
export HF_TOKEN=your_token
python mirror_to_hf.py \
  --source source-org/source-model \
  --dest your-username/Nepalaya-R
```

## Documentation

- **[SETUP.md](SETUP.md)** - Detailed setup and configuration guide
- **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Deployment instructions
- **[inference/README.md](inference/README.md)** - Inference code documentation

## Model Architecture

Nepalaya-R architecture summary:
- **Parameters:** 671B
- **Context Length:** Extended via sparse attention
- **Training:** Sparse attention based training pipeline
- **Architecture:** Optimized transformer with mixture-of-experts

## Key Features

- Multi-expert routing for efficient inference
- Sparse attention for long-context processing
- Chat template support
- Distributed inference capabilities

## System Requirements

- **GPU Memory:** 48GB+ VRAM recommended
- **RAM:** 64GB+ system memory
- **Storage:** ~300GB for full model weights
- **SSD:** Fast storage recommended

## Usage Examples

### Basic Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "your-username/Nepalaya-R",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")

inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```

### Chat Mode
```python
messages = [
    {"role": "user", "content": "What is machine learning?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
```

## Repository Structure

```
Nepalaya-R/
├── README.md                          # This file
├── SETUP.md                           # Setup guide
├── GITHUB_DEPLOY.md                   # Deployment guide
├── requirements.txt                   # Python dependencies
├── config.json                        # Model configuration
├── tokenizer.json                     # Tokenizer
├── quick_inference.py                 # Quick inference script
├── download_model.py                  # Model downloader
├── mirror_to_hf.py                    # HF mirroring tool
├── inference/                         # Inference code
│   ├── generate.py                    # Generation script
│   ├── model.py                       # Model implementation
│   ├── convert.py                     # Weight converter
│   └── config_671B_nepalaya.json      # Inference config
└── assets/                            # Chat templates
```

## Files Included

- **Source Code:** Full inference implementation
- **Configuration:** Model and generation configs
- **Tokenizer:** Complete tokenizer setup
- **Documentation:** Setup and usage guides
- **Utilities:** Download and mirror scripts

## License

MIT License - See [LICENSE](LICENSE) file

## Support

For documentation, see [SETUP.md](SETUP.md)
For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)

---

Nepalaya-R model card and repository maintained by the Nepalaya-R project.