Sheikh-2.5-Coder

A lightweight 3B parameter code-focused language model inspired by MiniMax-M2 architecture, optimized for efficient on-device deployment.

Model Description

Sheikh-2.5-Coder is a 3 billion parameter transformer model specifically designed for code generation and programming assistance. Inspired by the efficient architecture of MiniMax-M2, this model delivers strong performance in code generation while being optimized for on-device deployment.

Key Features

3B Parameters: Optimized for efficiency and performance balance
Code-Focused Training: Trained on diverse programming languages and code patterns
On-Device Ready: Quantized variants available for mobile and edge deployment
Multi-Language Support: Handles multiple programming languages
Chat Capabilities: Instruction-tuned for conversational coding assistance
Efficient Architecture: Inspired by MiniMax-M2's efficiency principles

Performance Highlights

Competitive performance with models 2.5x larger
Optimized memory usage for mobile deployment
Fast inference times suitable for real-time applications
Strong performance on code generation benchmarks

Model Variants

Base Model: Full precision for research and development
8-bit Quantized: Balanced performance and memory usage
4-bit Quantized: Maximum efficiency for edge devices

Usage

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
model_name = "your-username/sheikh-2.5-coder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate code
prompt = "Write a function to calculate the factorial of a number:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Usage

# For conversational interaction
messages = [
    {"role": "user", "content": "Help me write a Python function to sort a list"}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=200, temperature=0.1)
response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)

Technical Specifications

Parameters: 3.09B (2.77B non-embedding)
Context Length: 32,768 tokens
Architecture: Transformer with attention optimizations
Training Data: Diverse programming languages and code-comment pairs
Optimization: Quantization-ready for on-device deployment

Benchmarks

Performance metrics will be added after training completion

Deployment

CPU Inference

model = AutoModelForCausalLM.from_pretrained(
    "your-username/sheikh-2.5-coder",
    torch_dtype=torch.float32,
    device_map="cpu"
)

Mobile Deployment

For mobile deployment, use the quantized variants:

8-bit quantized model for balance of speed and accuracy
4-bit quantized model for maximum efficiency

License

[License information to be added]

Contributing

We welcome contributions! Please see our contributing guidelines for more details.

Citation

@article{sheikh2024sheikh25coder,
  title={Sheikh-2.5-Coder: Efficient On-Device Code Generation Model},
  author={Author Name},
  year={2024}
}

Acknowledgments

Inspired by MiniMax-M2 architecture
Trained on diverse code datasets
Built with modern transformer optimizations

Note: This is a research model. For production use, please thoroughly test performance and consider safety implications.