--- license: mit library_name: transformers --- # Nepalaya-R Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage. ## About This Model This repository contains the Nepalaya-R model implementation with: - ✅ Full source code and inference implementations - ✅ Tokenizer configuration adapted for Nepalaya-R - ✅ Easy-to-use inference scripts - ✅ Documentation and setup guides ## Quick Start ### Installation ```bash pip install -r requirements.txt ``` ### Download & Setup Option 1: Download from Hugging Face ```bash export HF_TOKEN=your_token python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights ``` Option 2: Run Quick Inference ```bash python quick_inference.py --prompt "Your prompt here" ``` ### Mirror Setup To create your own Nepalaya-R repo mirror: ```bash export HF_TOKEN=your_token python mirror_to_hf.py \ --source source-org/source-model \ --dest your-username/Nepalaya-R ``` ## Documentation - **[SETUP.md](SETUP.md)** - Detailed setup and configuration guide - **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Deployment instructions - **[inference/README.md](inference/README.md)** - Inference code documentation ## Model Architecture Nepalaya-R architecture summary: - **Parameters:** 671B - **Context Length:** Extended via sparse attention - **Training:** Sparse attention based training pipeline - **Architecture:** Optimized transformer with mixture-of-experts ## Key Features - Multi-expert routing for efficient inference - Sparse attention for long-context processing - Chat template support - Distributed inference capabilities ## System Requirements - **GPU Memory:** 48GB+ VRAM recommended - **RAM:** 64GB+ system memory - **Storage:** ~300GB for full model weights - **SSD:** Fast storage recommended ## Usage Examples ### Basic Generation ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "your-username/Nepalaya-R", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R") inputs = tokenizer("Hello", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0])) ``` ### Chat Mode ```python messages = [ {"role": "user", "content": "What is machine learning?"} ] inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) ``` ## Repository Structure ``` Nepalaya-R/ ├── README.md # This file ├── SETUP.md # Setup guide ├── GITHUB_DEPLOY.md # Deployment guide ├── requirements.txt # Python dependencies ├── config.json # Model configuration ├── tokenizer.json # Tokenizer ├── quick_inference.py # Quick inference script ├── download_model.py # Model downloader ├── mirror_to_hf.py # HF mirroring tool ├── inference/ # Inference code │ ├── generate.py # Generation script │ ├── model.py # Model implementation │ ├── convert.py # Weight converter │ └── config_671B_nepalaya.json # Inference config └── assets/ # Chat templates ``` ## Files Included - **Source Code:** Full inference implementation - **Configuration:** Model and generation configs - **Tokenizer:** Complete tokenizer setup - **Documentation:** Setup and usage guides - **Utilities:** Download and mirror scripts ## License MIT License - See [LICENSE](LICENSE) file ## Support For documentation, see [SETUP.md](SETUP.md) For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md) --- Nepalaya-R model card and repository maintained by the Nepalaya-R project.