RishAI-Base-v2 / README.md
Harishapc01's picture
Initial upload of RishAI-Base-v2: Sparse MoE multilingual model
68a9ee7 verified
# Rish AI
## Model Description
Rish AI is a cutting-edge Mixture of Experts (MoE) transformer model designed for efficient and scalable language understanding and generation. It features sparse routing with 7 experts per token, advanced rotary position embeddings, and optimized attention mechanisms.
## Key Features
- **Sparse Mixture of Experts**: 7 experts with 5 experts activated per token for optimal efficiency
- **Rotary Position Embeddings**: Dynamic RoPE scaling for better long-context handling
- **Grouped Query Attention**: Efficient attention with reduced key/value heads
- **RMSNorm**: Improved normalization for stable training
- **Load Balancing**: Automatic expert load balancing during training
## Usage
### Installation
```bash
pip install transformers
```
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model_name = "your-org/RishAI-1B-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Prepare input
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
# Generate response
outputs = model.generate(**inputs, max_length=50, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Advanced Usage
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model with specific configuration
model = AutoModelForCausalLM.from_pretrained(
"your-org/RishAI-1B-7B",
torch_dtype=torch.bfloat16, # For memory efficiency
device_map="auto" # Automatic device placement
)
tokenizer = AutoTokenizer.from_pretrained("your-org/RishAI-1B-7B")
# Multi-turn conversation
conversation = [
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is a subset of AI..."},
{"role": "user", "content": "Can you give a practical example?"}
]
# Format conversation
formatted_input = tokenizer.apply_chat_template(conversation, tokenize=False)
inputs = tokenizer(formatted_input, return_tensors="pt")
# Generate with controlled parameters
outputs = model.generate(
**inputs,
max_length=200,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Model Configuration
```python
from transformers import RishAIConfig
# Create custom configuration
config = RishAIConfig(
vocab_size=100352,
hidden_size=4096,
num_hidden_layers=32,
num_attention_heads=32,
num_experts=7, # Number of experts
num_experts_per_tok=5, # Experts activated per token
max_position_embeddings=4096,
rope_scaling={"rope_type": "dynamic", "factor": 1.0}
)
# Initialize model with config
from transformers import RishAIModel
model = RishAIModel(config)
```
## Model Architecture
### Sparse Mixture of Experts (MoE)
- **Experts**: 7 specialized sub-networks
- **Routing**: Top-5 expert selection per token
- **Load Balancing**: Automatic expert utilization optimization
### Attention Mechanism
- **Grouped Query Attention**: Efficient key/value head reduction
- **Rotary Embeddings**: Position-aware attention with dynamic scaling
- **RMSNorm**: Stable layer normalization
### Training Features
- **Gradient Checkpointing**: Memory-efficient training
- **Flash Attention**: Optimized attention computation
- **Expert Parallelism**: Distributed expert training
## Performance
### Speed
- **Inference**: Optimized for fast generation
- **Training**: Efficient MoE routing and load balancing
- **Memory**: Sparse activation reduces memory footprint
### Quality
- **Perplexity**: Competitive with state-of-the-art models
- **Long Context**: Effective handling of 4K+ token sequences
- **Multitask**: Strong performance across diverse tasks
## Limitations
- Requires significant computational resources for training
- Memory usage scales with number of active experts
- Best performance on modern GPUs with ample VRAM
## Citation
```bibtex
@misc{rishailabs_2026,
author = { RishAILabs },
title = { RLLM-Base (Revision 552ee30) },
year = 2026,
url = { https://huggingface.co/RishAILabs/RLLM-Base },
doi = { 10.57967/hf/7560 },
publisher = { Hugging Face }
}
```
## License
This model is released under the Apache 2.0 license.