File size: 4,072 Bytes

f5979c5
 
 
 
8f2bfa9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5979c5
8f2bfa9
f5979c5
 
8f2bfa9
 
 
f5979c5
8f2bfa9
 
f5979c5
 
8f2bfa9
 
 
 
f5979c5
8f2bfa9
f5979c5
8f2bfa9
 
 
 
 
 
f5979c5
 
8f2bfa9
f5979c5
8f2bfa9
 
 
f5979c5
8f2bfa9
f5979c5
8f2bfa9
 
 
 
 
f5979c5
8f2bfa9
f5979c5
8f2bfa9
 
 
 
f5979c5
8f2bfa9
f5979c5
8f2bfa9
 
 
 
f5979c5
8f2bfa9
f5979c5
8f2bfa9
 
 
f5979c5
8f2bfa9
 
 
 
 
 
f5979c5
8f2bfa9
 
 
 
f5979c5
8f2bfa9
 
 
 
 
 
 
 
 
 
f5979c5
 
8f2bfa9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5979c5
 
8f2bfa9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5979c5
8f2bfa9

---
license: mit
library_name: transformers
---

# Nepalaya-R

Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.

## About This Model

This repository contains the Nepalaya-R model implementation with:

- ✅ Full source code and inference implementations
- ✅ Tokenizer configuration adapted for Nepalaya-R
- ✅ Easy-to-use inference scripts
- ✅ Documentation and setup guides

## Quick Start

### Installation

```bash
pip install -r requirements.txt
```

### Download & Setup

Option 1: Download from Hugging Face
```bash
export HF_TOKEN=your_token
python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
```

Option 2: Run Quick Inference
```bash
python quick_inference.py --prompt "Your prompt here"
```

### Mirror Setup

To create your own Nepalaya-R repo mirror:
```bash
export HF_TOKEN=your_token
python mirror_to_hf.py \
  --source source-org/source-model \
  --dest your-username/Nepalaya-R
```

## Documentation

- **[SETUP.md](SETUP.md)** - Detailed setup and configuration guide
- **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Deployment instructions
- **[inference/README.md](inference/README.md)** - Inference code documentation

## Model Architecture

Nepalaya-R architecture summary:
- **Parameters:** 671B
- **Context Length:** Extended via sparse attention
- **Training:** Sparse attention based training pipeline
- **Architecture:** Optimized transformer with mixture-of-experts

## Key Features

- Multi-expert routing for efficient inference
- Sparse attention for long-context processing
- Chat template support
- Distributed inference capabilities

## System Requirements

- **GPU Memory:** 48GB+ VRAM recommended
- **RAM:** 64GB+ system memory
- **Storage:** ~300GB for full model weights
- **SSD:** Fast storage recommended

## Usage Examples

### Basic Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "your-username/Nepalaya-R",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")

inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```

### Chat Mode
```python
messages = [
    {"role": "user", "content": "What is machine learning?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
```

## Repository Structure

```
Nepalaya-R/
├── README.md                          # This file
├── SETUP.md                           # Setup guide
├── GITHUB_DEPLOY.md                   # Deployment guide
├── requirements.txt                   # Python dependencies
├── config.json                        # Model configuration
├── tokenizer.json                     # Tokenizer
├── quick_inference.py                 # Quick inference script
├── download_model.py                  # Model downloader
├── mirror_to_hf.py                    # HF mirroring tool
├── inference/                         # Inference code
│   ├── generate.py                    # Generation script
│   ├── model.py                       # Model implementation
│   ├── convert.py                     # Weight converter
│   └── config_671B_nepalaya.json      # Inference config
└── assets/                            # Chat templates
```

## Files Included

- **Source Code:** Full inference implementation
- **Configuration:** Model and generation configs
- **Tokenizer:** Complete tokenizer setup
- **Documentation:** Setup and usage guides
- **Utilities:** Download and mirror scripts

## License

MIT License - See [LICENSE](LICENSE) file

## Support

For documentation, see [SETUP.md](SETUP.md)
For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)

---

Nepalaya-R model card and repository maintained by the Nepalaya-R project.