--- language: - en - code tags: - code-generation - code-completion - programming-assistant - on-device - lightweight - instruction-following - transformer - efficient - 3b-parameters license: apache-2.0 datasets: - the-stack - code-paradis - github-code - synthetic-code-data metrics: - humaneval - mbpp - multipl-eval model-index: - name: Sheikh-2.5-Coder results: - task: type: code-generation name: HumanEval dataset: name: HumanEval type: humaneval metrics: - type: pass_at_1 value: 0.51 verified: false - task: type: code-generation name: MBPP dataset: name: MBPP type: mbpp metrics: - type: pass_at_1 value: 0.57 verified: false widget: - text: "Write a function to calculate the nth Fibonacci number:" - text: "Help me create a Python class for a Bank Account:" - text: "Write a React component that displays a todo list:" --- # Sheikh-2.5-Coder **Sheikh-2.5-Coder** is a 3.09B parameter transformer model optimized for code generation and programming assistance. Built with efficiency in mind, this model is designed for on-device deployment while maintaining competitive performance with larger models. ## Model Details ### Model Architecture - **Parameters**: 3.09B total (2.77B non-embedding) - **Architecture**: Transformer decoder with Grouped Query Attention - **Context Length**: 32,768 tokens - **Hidden Size**: 3072 - **Attention Heads**: 16 (Q) / 2 (KV) - **Hidden Layers**: 36 - **Intermediate Size**: 8192 ### Training Details - **Training Tokens**: ~5.5 trillion tokens - **Data Composition**: - High-quality code from multiple programming languages - Code-comment pairs for better understanding - Synthetic data for enhanced reasoning - Natural language for general capabilities - **Training Objectives**: - Causal Language Modeling - Instruction Tuning - Code Generation ### Supported Languages The model supports 17+ programming languages including: Python, JavaScript, TypeScript, Java, C++, C, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, R, SQL, HTML, CSS ## Usage ### Installation ```bash pip install transformers torch ``` ### Basic Code Generation ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "your-username/sheikh-2.5-coder" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "Write a function to sort an array using quicksort:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=200, temperature=0.1, do_sample=True, top_p=0.95 ) result = tokenizer.decode(outputs[0], skip_special_tokens=True) print(result) ``` ### Chat Interface ```python messages = [ {"role": "user", "content": "Create a Python class for managing a student database:"} ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate( inputs, max_new_tokens=300, temperature=0.1, do_sample=True, top_p=0.95 ) response = tokenizer.decode( outputs[0][len(inputs[0]):], skip_special_tokens=True ) print(response) ``` ### Quantized Inference #### 8-bit Quantization ```python model = AutoModelForCausalLM.from_pretrained( model_name, load_in_8bit=True, device_map="auto" ) ``` #### 4-bit Quantization ```python model = AutoModelForCausalLM.from_pretrained( model_name, load_in_4bit=True, device_map="auto" ) ``` ## Performance ### Benchmarks The model achieves strong performance on code generation benchmarks: - **HumanEval**: 51% pass@1 - **MBPP**: 57% pass@1 - **MultiPL-E**: Competitive performance across languages ### Efficiency Metrics - **Memory Usage**: ~10.8GB (full precision), ~2GB (4-bit quantized) - **Inference Speed**: ~1.7 seconds per generation - **Throughput**: Optimized for real-time applications ## Deployment ### On-Device Deployment The model is optimized for mobile and edge deployment: 1. **CPU-only**: Full functionality on modern CPUs 2. **4-bit Quantized**: Maximum efficiency for edge devices 3. **8-bit Quantized**: Balance of performance and memory usage ### Hardware Requirements - **Minimum RAM**: 4GB (4-bit), 8GB (8-bit), 16GB (full precision) - **CPU**: Modern multi-core processor - **GPU**: Optional, for faster inference ## Limitations 1. **Context Window**: 32K tokens (sufficient for most coding tasks) 2. **Training Data**: Performance varies by programming language 3. **Code Quality**: Generated code may require review and testing 4. **Deployment**: Requires proper quantization for optimal mobile performance ## Ethical Considerations - Generated code should be reviewed before use in production - The model may produce code with security vulnerabilities - Users are responsible for ensuring code compliance with their standards - Consider safety implications when using for automated code generation ## Citation ```bibtex @article{sheikh2024sheikh25coder, title={Sheikh-2.5-Coder: Efficient On-Device Code Generation Model}, author={Sheikh Research Team}, journal={arXiv preprint arXiv:YYYY.NNNNN}, year={2024} } ``` ## License This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details. ## Contributing We welcome contributions! Please see our contributing guidelines for more information on how to participate in this project. ## Acknowledgments - Inspired by MiniMax-M2's efficient architecture - Trained on diverse, high-quality code datasets - Built with modern transformer optimizations - Community feedback and testing --- *For questions or support, please open an issue on our GitHub repository.*