|
|
--- |
|
|
language: |
|
|
- en |
|
|
- code |
|
|
tags: |
|
|
- code-generation |
|
|
- code-completion |
|
|
- programming-assistant |
|
|
- on-device |
|
|
- lightweight |
|
|
- instruction-following |
|
|
- transformer |
|
|
- efficient |
|
|
- 3b-parameters |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- the-stack |
|
|
- code-paradis |
|
|
- github-code |
|
|
- synthetic-code-data |
|
|
metrics: |
|
|
- humaneval |
|
|
- mbpp |
|
|
- multipl-eval |
|
|
model-index: |
|
|
- name: Sheikh-2.5-Coder |
|
|
results: |
|
|
- task: |
|
|
type: code-generation |
|
|
name: HumanEval |
|
|
dataset: |
|
|
name: HumanEval |
|
|
type: humaneval |
|
|
metrics: |
|
|
- type: pass_at_1 |
|
|
value: 0.51 |
|
|
verified: false |
|
|
- task: |
|
|
type: code-generation |
|
|
name: MBPP |
|
|
dataset: |
|
|
name: MBPP |
|
|
type: mbpp |
|
|
metrics: |
|
|
- type: pass_at_1 |
|
|
value: 0.57 |
|
|
verified: false |
|
|
widget: |
|
|
- text: "Write a function to calculate the nth Fibonacci number:" |
|
|
- text: "Help me create a Python class for a Bank Account:" |
|
|
- text: "Write a React component that displays a todo list:" |
|
|
--- |
|
|
|
|
|
# Sheikh-2.5-Coder |
|
|
|
|
|
**Sheikh-2.5-Coder** is a 3.09B parameter transformer model optimized for code generation and programming assistance. Built with efficiency in mind, this model is designed for on-device deployment while maintaining competitive performance with larger models. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Architecture |
|
|
- **Parameters**: 3.09B total (2.77B non-embedding) |
|
|
- **Architecture**: Transformer decoder with Grouped Query Attention |
|
|
- **Context Length**: 32,768 tokens |
|
|
- **Hidden Size**: 3072 |
|
|
- **Attention Heads**: 16 (Q) / 2 (KV) |
|
|
- **Hidden Layers**: 36 |
|
|
- **Intermediate Size**: 8192 |
|
|
|
|
|
### Training Details |
|
|
- **Training Tokens**: ~5.5 trillion tokens |
|
|
- **Data Composition**: |
|
|
- High-quality code from multiple programming languages |
|
|
- Code-comment pairs for better understanding |
|
|
- Synthetic data for enhanced reasoning |
|
|
- Natural language for general capabilities |
|
|
- **Training Objectives**: |
|
|
- Causal Language Modeling |
|
|
- Instruction Tuning |
|
|
- Code Generation |
|
|
|
|
|
### Supported Languages |
|
|
The model supports 17+ programming languages including: |
|
|
Python, JavaScript, TypeScript, Java, C++, C, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, R, SQL, HTML, CSS |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Basic Code Generation |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name = "your-username/sheikh-2.5-coder" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
prompt = "Write a function to sort an array using quicksort:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=200, |
|
|
temperature=0.1, |
|
|
do_sample=True, |
|
|
top_p=0.95 |
|
|
) |
|
|
result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
### Chat Interface |
|
|
```python |
|
|
messages = [ |
|
|
{"role": "user", "content": "Create a Python class for managing a student database:"} |
|
|
] |
|
|
|
|
|
inputs = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
add_generation_prompt=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
inputs, |
|
|
max_new_tokens=300, |
|
|
temperature=0.1, |
|
|
do_sample=True, |
|
|
top_p=0.95 |
|
|
) |
|
|
|
|
|
response = tokenizer.decode( |
|
|
outputs[0][len(inputs[0]):], |
|
|
skip_special_tokens=True |
|
|
) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Quantized Inference |
|
|
|
|
|
#### 8-bit Quantization |
|
|
```python |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
load_in_8bit=True, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
#### 4-bit Quantization |
|
|
```python |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
load_in_4bit=True, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Benchmarks |
|
|
The model achieves strong performance on code generation benchmarks: |
|
|
|
|
|
- **HumanEval**: 51% pass@1 |
|
|
- **MBPP**: 57% pass@1 |
|
|
- **MultiPL-E**: Competitive performance across languages |
|
|
|
|
|
### Efficiency Metrics |
|
|
- **Memory Usage**: ~10.8GB (full precision), ~2GB (4-bit quantized) |
|
|
- **Inference Speed**: ~1.7 seconds per generation |
|
|
- **Throughput**: Optimized for real-time applications |
|
|
|
|
|
## Deployment |
|
|
|
|
|
### On-Device Deployment |
|
|
The model is optimized for mobile and edge deployment: |
|
|
|
|
|
1. **CPU-only**: Full functionality on modern CPUs |
|
|
2. **4-bit Quantized**: Maximum efficiency for edge devices |
|
|
3. **8-bit Quantized**: Balance of performance and memory usage |
|
|
|
|
|
### Hardware Requirements |
|
|
- **Minimum RAM**: 4GB (4-bit), 8GB (8-bit), 16GB (full precision) |
|
|
- **CPU**: Modern multi-core processor |
|
|
- **GPU**: Optional, for faster inference |
|
|
|
|
|
## Limitations |
|
|
|
|
|
1. **Context Window**: 32K tokens (sufficient for most coding tasks) |
|
|
2. **Training Data**: Performance varies by programming language |
|
|
3. **Code Quality**: Generated code may require review and testing |
|
|
4. **Deployment**: Requires proper quantization for optimal mobile performance |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- Generated code should be reviewed before use in production |
|
|
- The model may produce code with security vulnerabilities |
|
|
- Users are responsible for ensuring code compliance with their standards |
|
|
- Consider safety implications when using for automated code generation |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{sheikh2024sheikh25coder, |
|
|
title={Sheikh-2.5-Coder: Efficient On-Device Code Generation Model}, |
|
|
author={Sheikh Research Team}, |
|
|
journal={arXiv preprint arXiv:YYYY.NNNNN}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
## Contributing |
|
|
|
|
|
We welcome contributions! Please see our contributing guidelines for more information on how to participate in this project. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Inspired by MiniMax-M2's efficient architecture |
|
|
- Trained on diverse, high-quality code datasets |
|
|
- Built with modern transformer optimizations |
|
|
- Community feedback and testing |
|
|
|
|
|
--- |
|
|
|
|
|
*For questions or support, please open an issue on our GitHub repository.* |