Update README.md
Browse files
README.md
CHANGED
|
@@ -5,9 +5,94 @@ tags:
|
|
| 5 |
- transformer
|
| 6 |
- pytorch
|
| 7 |
- causal-lm
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
# RLLM (Base Model)
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- transformer
|
| 6 |
- pytorch
|
| 7 |
- causal-lm
|
| 8 |
+
- moe
|
| 9 |
+
- mixture-of-experts
|
| 10 |
+
- rish-ai-labs
|
| 11 |
---
|
| 12 |
|
|
|
|
| 13 |
# RLLM (Base Model)
|
| 14 |
|
| 15 |
+
## Model Description
|
| 16 |
+
|
| 17 |
+
RLLM is a base language model developed by **Rish AI Labs**, an applied artificial intelligence lab focused on LLMs, Generative AI, AI consulting, and research.
|
| 18 |
+
|
| 19 |
+
This model features a **Mixture of Experts (MoE)** architecture with 16 experts, providing efficient scaling and specialization capabilities. It was trained using identity-focused pretraining to establish a strong foundation for downstream tasks.
|
| 20 |
+
|
| 21 |
+
## Key Features
|
| 22 |
+
|
| 23 |
+
- **Architecture**: Transformer with MoE (16 experts, top-2 routing)
|
| 24 |
+
- **Parameters**: ~275M total parameters
|
| 25 |
+
- **Training**: Identity-focused pretraining
|
| 26 |
+
- **Precision**: FP32 training, optimized for inference
|
| 27 |
+
- **Framework**: PyTorch + Transformers
|
| 28 |
+
|
| 29 |
+
## Intended Use
|
| 30 |
+
|
| 31 |
+
This base model serves as a foundation for:
|
| 32 |
+
- Fine-tuning on specific domains
|
| 33 |
+
- Research in efficient language model architectures
|
| 34 |
+
- Development of specialized AI applications
|
| 35 |
+
- Understanding MoE dynamics and scaling
|
| 36 |
+
|
| 37 |
+
## About Rish AI Labs
|
| 38 |
+
|
| 39 |
+
**Rish AI Labs** is pioneering the future of Enterprise AI through research, applied solutions, and LLM-driven innovation. Based in Bangalore, India, we focus on:
|
| 40 |
+
|
| 41 |
+
- **Applied AI Solutions**: Enterprise-grade AI implementations
|
| 42 |
+
- **Research**: Cutting-edge AI research and publications
|
| 43 |
+
- **LLM Development**: Large language model research and deployment
|
| 44 |
+
- **AI Consulting**: Expert guidance for AI transformation
|
| 45 |
+
|
| 46 |
+
### Mission
|
| 47 |
+
"Pioneering the future of Enterprise AI through research, applied solutions, and LLM-driven innovation."
|
| 48 |
+
|
| 49 |
+
### Contact
|
| 50 |
+
- Website: [rishailabs.com](https://rishailabs.com)
|
| 51 |
+
- Location: Bangalore, India
|
| 52 |
+
- Focus: Enterprise AI, LLMs, Generative AI, AI Research
|
| 53 |
+
|
| 54 |
+
## Model Architecture Details
|
| 55 |
+
|
| 56 |
+
- **Layers**: 12 transformer layers
|
| 57 |
+
- **Heads**: 12 attention heads
|
| 58 |
+
- **Hidden Size**: 768
|
| 59 |
+
- **Experts**: 16 (MoE)
|
| 60 |
+
- **Top-K Routing**: 2
|
| 61 |
+
- **Vocabulary**: 50,304 tokens
|
| 62 |
+
- **Sequence Length**: Configurable (trained on various lengths)
|
| 63 |
+
|
| 64 |
+
## Usage
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 68 |
+
|
| 69 |
+
tokenizer = AutoTokenizer.from_pretrained("RishAILabs/RLLM-Base")
|
| 70 |
+
model = AutoModelForCausalLM.from_pretrained("RishAILabs/RLLM-Base")
|
| 71 |
+
|
| 72 |
+
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
|
| 73 |
+
outputs = model.generate(**inputs, max_length=50)
|
| 74 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## Training Details
|
| 78 |
+
|
| 79 |
+
- **Dataset**: Identity-focused dataset for stable pretraining
|
| 80 |
+
- **Precision**: FP32 for training stability
|
| 81 |
+
- **Optimization**: AdamW optimizer
|
| 82 |
+
- **Framework**: Custom Rish-Core training framework
|
| 83 |
+
- **Hardware**: Optimized for both CPU and GPU inference
|
| 84 |
+
|
| 85 |
+
## Limitations
|
| 86 |
+
|
| 87 |
+
- Base model - may require fine-tuning for specific tasks
|
| 88 |
+
- English language focus
|
| 89 |
+
- Generated content should be reviewed for appropriateness
|
| 90 |
+
|
| 91 |
+
## Citation
|
| 92 |
+
|
| 93 |
+
If you use this model in your research, please cite:
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
*Developed by Rish AI Labs - Applied Artificial Intelligence & Research*
|