Set Nepalaya-R model card branding

8f2bfa9 verified 2 days ago

4.07 kB

	---
	license: mit
	library_name: transformers
	---

	# Nepalaya-R

	Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.

	## About This Model

	This repository contains the Nepalaya-R model implementation with:

	- ✅ Full source code and inference implementations
	- ✅ Tokenizer configuration adapted for Nepalaya-R
	- ✅ Easy-to-use inference scripts
	- ✅ Documentation and setup guides

	## Quick Start

	### Installation

	```bash
	pip install -r requirements.txt
	```

	### Download & Setup

	Option 1: Download from Hugging Face
	```bash
	export HF_TOKEN=your_token
	python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
	```

	Option 2: Run Quick Inference
	```bash
	python quick_inference.py --prompt "Your prompt here"
	```

	### Mirror Setup

	To create your own Nepalaya-R repo mirror:
	```bash
	export HF_TOKEN=your_token
	python mirror_to_hf.py \
	--source source-org/source-model \
	--dest your-username/Nepalaya-R
	```

	## Documentation

	- [SETUP.md](SETUP.md) - Detailed setup and configuration guide
	- [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md) - Deployment instructions
	- [inference/README.md](inference/README.md) - Inference code documentation

	## Model Architecture

	Nepalaya-R architecture summary:
	- Parameters: 671B
	- Context Length: Extended via sparse attention
	- Training: Sparse attention based training pipeline
	- Architecture: Optimized transformer with mixture-of-experts

	## Key Features

	- Multi-expert routing for efficient inference
	- Sparse attention for long-context processing
	- Chat template support
	- Distributed inference capabilities

	## System Requirements

	- GPU Memory: 48GB+ VRAM recommended
	- RAM: 64GB+ system memory
	- Storage: ~300GB for full model weights
	- SSD: Fast storage recommended

	## Usage Examples

	### Basic Generation
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"your-username/Nepalaya-R",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")

	inputs = tokenizer("Hello", return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0]))
	```

	### Chat Mode
	```python
	messages = [
	{"role": "user", "content": "What is machine learning?"}
	]
	inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256)
	```

	## Repository Structure

	```
	Nepalaya-R/
	├── README.md # This file
	├── SETUP.md # Setup guide
	├── GITHUB_DEPLOY.md # Deployment guide
	├── requirements.txt # Python dependencies
	├── config.json # Model configuration
	├── tokenizer.json # Tokenizer
	├── quick_inference.py # Quick inference script
	├── download_model.py # Model downloader
	├── mirror_to_hf.py # HF mirroring tool
	├── inference/ # Inference code
	│ ├── generate.py # Generation script
	│ ├── model.py # Model implementation
	│ ├── convert.py # Weight converter
	│ └── config_671B_nepalaya.json # Inference config
	└── assets/ # Chat templates
	```

	## Files Included

	- Source Code: Full inference implementation
	- Configuration: Model and generation configs
	- Tokenizer: Complete tokenizer setup
	- Documentation: Setup and usage guides
	- Utilities: Download and mirror scripts

	## License

	MIT License - See [LICENSE](LICENSE) file

	## Support

	For documentation, see [SETUP.md](SETUP.md)
	For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)

	---

	Nepalaya-R model card and repository maintained by the Nepalaya-R project.