RishAILabs
/

RLLM-Base

Mixture of Experts

mixture-of-experts

Model card Files Files and versions

RishAILabs commited on Jan 17

Commit

88b3bf6

·

verified ·

1 Parent(s): 552ee30

Update README.md

Files changed (1) hide show

README.md +87 -2

README.md CHANGED Viewed

@@ -5,9 +5,94 @@ tags:
 - transformer
 - pytorch
 - causal-lm
 ---
 # RLLM (Base Model)
-This is a base model.

 - transformer
 - pytorch
 - causal-lm
+- moe
+- mixture-of-experts
+- rish-ai-labs
 ---
 # RLLM (Base Model)
+## Model Description
+RLLM is a base language model developed by **Rish AI Labs**, an applied artificial intelligence lab focused on LLMs, Generative AI, AI consulting, and research.
+This model features a **Mixture of Experts (MoE)** architecture with 16 experts, providing efficient scaling and specialization capabilities. It was trained using identity-focused pretraining to establish a strong foundation for downstream tasks.
+## Key Features
+- **Architecture**: Transformer with MoE (16 experts, top-2 routing)
+- **Parameters**: ~275M total parameters
+- **Training**: Identity-focused pretraining
+- **Precision**: FP32 training, optimized for inference
+- **Framework**: PyTorch + Transformers
+## Intended Use
+This base model serves as a foundation for:
+- Fine-tuning on specific domains
+- Research in efficient language model architectures
+- Development of specialized AI applications
+- Understanding MoE dynamics and scaling
+## About Rish AI Labs
+**Rish AI Labs** is pioneering the future of Enterprise AI through research, applied solutions, and LLM-driven innovation. Based in Bangalore, India, we focus on:
+- **Applied AI Solutions**: Enterprise-grade AI implementations
+- **Research**: Cutting-edge AI research and publications
+- **LLM Development**: Large language model research and deployment
+- **AI Consulting**: Expert guidance for AI transformation
+### Mission
+"Pioneering the future of Enterprise AI through research, applied solutions, and LLM-driven innovation."
+### Contact
+- Website: [rishailabs.com](https://rishailabs.com)
+- Location: Bangalore, India
+- Focus: Enterprise AI, LLMs, Generative AI, AI Research
+## Model Architecture Details
+- **Layers**: 12 transformer layers
+- **Heads**: 12 attention heads
+- **Hidden Size**: 768
+- **Experts**: 16 (MoE)
+- **Top-K Routing**: 2
+- **Vocabulary**: 50,304 tokens
+- **Sequence Length**: Configurable (trained on various lengths)
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("RishAILabs/RLLM-Base")
+model = AutoModelForCausalLM.from_pretrained("RishAILabs/RLLM-Base")
+inputs = tokenizer("Hello, how are you?", return_tensors="pt")
+outputs = model.generate(**inputs, max_length=50)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Training Details
+- **Dataset**: Identity-focused dataset for stable pretraining
+- **Precision**: FP32 for training stability
+- **Optimization**: AdamW optimizer
+- **Framework**: Custom Rish-Core training framework
+- **Hardware**: Optimized for both CPU and GPU inference
+## Limitations
+- Base model - may require fine-tuning for specific tasks
+- English language focus
+- Generated content should be reviewed for appropriateness
+## Citation
+If you use this model in your research, please cite:
+---
+*Developed by Rish AI Labs - Applied Artificial Intelligence & Research*