File size: 6,009 Bytes

f0e44a7

---
language:
  - en
  - code
tags:
  - code-generation
  - code-completion
  - programming-assistant
  - on-device
  - lightweight
  - instruction-following
  - transformer
  - efficient
  - 3b-parameters
license: apache-2.0
datasets:
  - the-stack
  - code-paradis
  - github-code
  - synthetic-code-data
metrics:
  - humaneval
  - mbpp
  - multipl-eval
model-index:
  - name: Sheikh-2.5-Coder
    results:
      - task:
          type: code-generation
          name: HumanEval
        dataset:
          name: HumanEval
          type: humaneval
        metrics:
          - type: pass_at_1
            value: 0.51
            verified: false
      - task:
          type: code-generation
          name: MBPP
        dataset:
          name: MBPP
          type: mbpp
        metrics:
          - type: pass_at_1
            value: 0.57
            verified: false
widget:
  - text: "Write a function to calculate the nth Fibonacci number:"
  - text: "Help me create a Python class for a Bank Account:"
  - text: "Write a React component that displays a todo list:"
---

# Sheikh-2.5-Coder

**Sheikh-2.5-Coder** is a 3.09B parameter transformer model optimized for code generation and programming assistance. Built with efficiency in mind, this model is designed for on-device deployment while maintaining competitive performance with larger models.

## Model Details

### Model Architecture
- **Parameters**: 3.09B total (2.77B non-embedding)
- **Architecture**: Transformer decoder with Grouped Query Attention
- **Context Length**: 32,768 tokens
- **Hidden Size**: 3072
- **Attention Heads**: 16 (Q) / 2 (KV)
- **Hidden Layers**: 36
- **Intermediate Size**: 8192

### Training Details
- **Training Tokens**: ~5.5 trillion tokens
- **Data Composition**: 
  - High-quality code from multiple programming languages
  - Code-comment pairs for better understanding
  - Synthetic data for enhanced reasoning
  - Natural language for general capabilities
- **Training Objectives**: 
  - Causal Language Modeling
  - Instruction Tuning
  - Code Generation

### Supported Languages
The model supports 17+ programming languages including:
Python, JavaScript, TypeScript, Java, C++, C, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, R, SQL, HTML, CSS

## Usage

### Installation
```bash
pip install transformers torch
```

### Basic Code Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "your-username/sheikh-2.5-coder"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Write a function to sort an array using quicksort:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.1,
    do_sample=True,
    top_p=0.95
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
```

### Chat Interface
```python
messages = [
    {"role": "user", "content": "Create a Python class for managing a student database:"}
]

inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=300,
    temperature=0.1,
    do_sample=True,
    top_p=0.95
)

response = tokenizer.decode(
    outputs[0][len(inputs[0]):], 
    skip_special_tokens=True
)
print(response)
```

### Quantized Inference

#### 8-bit Quantization
```python
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto"
)
```

#### 4-bit Quantization
```python
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    device_map="auto"
)
```

## Performance

### Benchmarks
The model achieves strong performance on code generation benchmarks:

- **HumanEval**: 51% pass@1
- **MBPP**: 57% pass@1
- **MultiPL-E**: Competitive performance across languages

### Efficiency Metrics
- **Memory Usage**: ~10.8GB (full precision), ~2GB (4-bit quantized)
- **Inference Speed**: ~1.7 seconds per generation
- **Throughput**: Optimized for real-time applications

## Deployment

### On-Device Deployment
The model is optimized for mobile and edge deployment:

1. **CPU-only**: Full functionality on modern CPUs
2. **4-bit Quantized**: Maximum efficiency for edge devices
3. **8-bit Quantized**: Balance of performance and memory usage

### Hardware Requirements
- **Minimum RAM**: 4GB (4-bit), 8GB (8-bit), 16GB (full precision)
- **CPU**: Modern multi-core processor
- **GPU**: Optional, for faster inference

## Limitations

1. **Context Window**: 32K tokens (sufficient for most coding tasks)
2. **Training Data**: Performance varies by programming language
3. **Code Quality**: Generated code may require review and testing
4. **Deployment**: Requires proper quantization for optimal mobile performance

## Ethical Considerations

- Generated code should be reviewed before use in production
- The model may produce code with security vulnerabilities
- Users are responsible for ensuring code compliance with their standards
- Consider safety implications when using for automated code generation

## Citation

```bibtex
@article{sheikh2024sheikh25coder,
  title={Sheikh-2.5-Coder: Efficient On-Device Code Generation Model},
  author={Sheikh Research Team},
  journal={arXiv preprint arXiv:YYYY.NNNNN},
  year={2024}
}
```

## License

This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.

## Contributing

We welcome contributions! Please see our contributing guidelines for more information on how to participate in this project.

## Acknowledgments

- Inspired by MiniMax-M2's efficient architecture
- Trained on diverse, high-quality code datasets
- Built with modern transformer optimizations
- Community feedback and testing

---

*For questions or support, please open an issue on our GitHub repository.*