RishAILabs
/

RLLM-Base

Mixture of Experts

mixture-of-experts

Model card Files Files and versions

RLLM-Base / README.md

RishAILabs's picture

Update README.md

d365caf verified about 2 months ago

|

history blame contribute delete

3.32 kB

	---
	language: en
	license: apache-2.0
	tags:
	- transformer
	- pytorch
	- causal-lm
	- moe
	- mixture-of-experts
	- rish-ai-labs
	---

	# RLLM (Base Model)

	## Model Description

	RLLM is a base language model developed by Rish AI Labs, an applied artificial intelligence lab focused on LLMs, Generative AI, AI consulting, and research.

	This model features a Mixture of Experts (MoE) architecture with 16 experts, providing efficient scaling and specialization capabilities. It was trained using identity-focused pretraining to establish a strong foundation for downstream tasks.

	## Key Features

	- Architecture: Transformer with MoE (16 experts, top-2 routing)
	- Parameters: ~275M total parameters
	- Training: Identity-focused pretraining
	- Precision: FP32 training, optimized for inference
	- Framework: PyTorch + Transformers

	## Intended Use

	This base model serves as a foundation for:
	- Fine-tuning on specific domains
	- Research in efficient language model architectures
	- Development of specialized AI applications
	- Understanding MoE dynamics and scaling

	## About Rish AI Labs

	Rish AI Labs is pioneering the future of Enterprise AI through research, applied solutions, and LLM-driven innovation. Based in Bangalore, India, we focus on:

	- Applied AI Solutions: Enterprise-grade AI implementations
	- Research: Cutting-edge AI research and publications
	- LLM Development: Large language model research and deployment
	- AI Consulting: Expert guidance for AI transformation

	### Mission
	"Pioneering the future of Enterprise AI through research, applied solutions, and LLM-driven innovation."

	### Contact
	- Website: [rishailabs.com](https://rishailabs.com)
	- Location: Bangalore, India
	- Focus: Enterprise AI, LLMs, Generative AI, AI Research

	## Model Architecture Details

	- Layers: 12 transformer layers
	- Heads: 12 attention heads
	- Hidden Size: 768
	- Experts: 16 (MoE)
	- Top-K Routing: 2
	- Vocabulary: 50,304 tokens
	- Sequence Length: Configurable (trained on various lengths)

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("RishAILabs/RLLM-Base")
	model = AutoModelForCausalLM.from_pretrained("RishAILabs/RLLM-Base")

	inputs = tokenizer("Hello, how are you?", return_tensors="pt")
	outputs = model.generate(**inputs, max_length=50)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Training Details

	- Dataset: Identity-focused dataset for stable pretraining
	- Precision: FP32 for training stability
	- Optimization: AdamW optimizer
	- Framework: Custom Rish-Core training framework
	- Hardware: Optimized for both CPU and GPU inference

	## Limitations

	- Base model - may require fine-tuning for specific tasks
	- English language focus
	- Generated content should be reviewed for appropriateness

	## Citation

	If you use this model in your research, please cite:


	---


	```bibtex
	@misc{rishailabs_2026,
	author = { RishAILabs },
	title = { RLLM-Base (Revision 552ee30) },
	year = 2026,
	url = { https://huggingface.co/RishAILabs/RLLM-Base },
	doi = { 10.57967/hf/7560 },
	publisher = { Hugging Face }
	}
	```


	Developed by Rish AI Labs - Applied Artificial Intelligence & Research