Sheikh-2.5-Coder
A lightweight 3B parameter code-focused language model inspired by MiniMax-M2 architecture, optimized for efficient on-device deployment.
Model Description
Sheikh-2.5-Coder is a 3 billion parameter transformer model specifically designed for code generation and programming assistance. Inspired by the efficient architecture of MiniMax-M2, this model delivers strong performance in code generation while being optimized for on-device deployment.
Key Features
- 3B Parameters: Optimized for efficiency and performance balance
- Code-Focused Training: Trained on diverse programming languages and code patterns
- On-Device Ready: Quantized variants available for mobile and edge deployment
- Multi-Language Support: Handles multiple programming languages
- Chat Capabilities: Instruction-tuned for conversational coding assistance
- Efficient Architecture: Inspired by MiniMax-M2's efficiency principles
Performance Highlights
- Competitive performance with models 2.5x larger
- Optimized memory usage for mobile deployment
- Fast inference times suitable for real-time applications
- Strong performance on code generation benchmarks
Model Variants
- Base Model: Full precision for research and development
- 8-bit Quantized: Balanced performance and memory usage
- 4-bit Quantized: Maximum efficiency for edge devices
Usage
Installation
pip install transformers torch
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model and tokenizer
model_name = "your-username/sheikh-2.5-coder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate code
prompt = "Write a function to calculate the factorial of a number:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Chat Usage
# For conversational interaction
messages = [
{"role": "user", "content": "Help me write a Python function to sort a list"}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=200, temperature=0.1)
response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)
Technical Specifications
- Parameters: 3.09B (2.77B non-embedding)
- Context Length: 32,768 tokens
- Architecture: Transformer with attention optimizations
- Training Data: Diverse programming languages and code-comment pairs
- Optimization: Quantization-ready for on-device deployment
Benchmarks
Performance metrics will be added after training completion
Deployment
CPU Inference
model = AutoModelForCausalLM.from_pretrained(
"your-username/sheikh-2.5-coder",
torch_dtype=torch.float32,
device_map="cpu"
)
Mobile Deployment
For mobile deployment, use the quantized variants:
- 8-bit quantized model for balance of speed and accuracy
- 4-bit quantized model for maximum efficiency
License
[License information to be added]
Contributing
We welcome contributions! Please see our contributing guidelines for more details.
Citation
@article{sheikh2024sheikh25coder,
title={Sheikh-2.5-Coder: Efficient On-Device Code Generation Model},
author={Author Name},
year={2024}
}
Acknowledgments
- Inspired by MiniMax-M2 architecture
- Trained on diverse code datasets
- Built with modern transformer optimizations
Note: This is a research model. For production use, please thoroughly test performance and consider safety implications.