Sheikh-2.5-Coder / model_card.md

Add model_card.md

f0e44a7 verified 3 months ago

6.01 kB

	---
	language:
	- en
	- code
	tags:
	- code-generation
	- code-completion
	- programming-assistant
	- on-device
	- lightweight
	- instruction-following
	- transformer
	- efficient
	- 3b-parameters
	license: apache-2.0
	datasets:
	- the-stack
	- code-paradis
	- github-code
	- synthetic-code-data
	metrics:
	- humaneval
	- mbpp
	- multipl-eval
	model-index:
	- name: Sheikh-2.5-Coder
	results:
	- task:
	type: code-generation
	name: HumanEval
	dataset:
	name: HumanEval
	type: humaneval
	metrics:
	- type: pass_at_1
	value: 0.51
	verified: false
	- task:
	type: code-generation
	name: MBPP
	dataset:
	name: MBPP
	type: mbpp
	metrics:
	- type: pass_at_1
	value: 0.57
	verified: false
	widget:
	- text: "Write a function to calculate the nth Fibonacci number:"
	- text: "Help me create a Python class for a Bank Account:"
	- text: "Write a React component that displays a todo list:"
	---

	# Sheikh-2.5-Coder

	Sheikh-2.5-Coder is a 3.09B parameter transformer model optimized for code generation and programming assistance. Built with efficiency in mind, this model is designed for on-device deployment while maintaining competitive performance with larger models.

	## Model Details

	### Model Architecture
	- Parameters: 3.09B total (2.77B non-embedding)
	- Architecture: Transformer decoder with Grouped Query Attention
	- Context Length: 32,768 tokens
	- Hidden Size: 3072
	- Attention Heads: 16 (Q) / 2 (KV)
	- Hidden Layers: 36
	- Intermediate Size: 8192

	### Training Details
	- Training Tokens: ~5.5 trillion tokens
	- Data Composition:
	- High-quality code from multiple programming languages
	- Code-comment pairs for better understanding
	- Synthetic data for enhanced reasoning
	- Natural language for general capabilities
	- Training Objectives:
	- Causal Language Modeling
	- Instruction Tuning
	- Code Generation

	### Supported Languages
	The model supports 17+ programming languages including:
	Python, JavaScript, TypeScript, Java, C++, C, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, R, SQL, HTML, CSS

	## Usage

	### Installation
	```bash
	pip install transformers torch
	```

	### Basic Code Generation
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "your-username/sheikh-2.5-coder"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	prompt = "Write a function to sort an array using quicksort:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.1,
	do_sample=True,
	top_p=0.95
	)
	result = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(result)
	```

	### Chat Interface
	```python
	messages = [
	{"role": "user", "content": "Create a Python class for managing a student database:"}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	inputs,
	max_new_tokens=300,
	temperature=0.1,
	do_sample=True,
	top_p=0.95
	)

	response = tokenizer.decode(
	outputs[0][len(inputs[0]):],
	skip_special_tokens=True
	)
	print(response)
	```

	### Quantized Inference

	#### 8-bit Quantization
	```python
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	load_in_8bit=True,
	device_map="auto"
	)
	```

	#### 4-bit Quantization
	```python
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	load_in_4bit=True,
	device_map="auto"
	)
	```

	## Performance

	### Benchmarks
	The model achieves strong performance on code generation benchmarks:

	- HumanEval: 51% pass@1
	- MBPP: 57% pass@1
	- MultiPL-E: Competitive performance across languages

	### Efficiency Metrics
	- Memory Usage: ~10.8GB (full precision), ~2GB (4-bit quantized)
	- Inference Speed: ~1.7 seconds per generation
	- Throughput: Optimized for real-time applications

	## Deployment

	### On-Device Deployment
	The model is optimized for mobile and edge deployment:

	1. CPU-only: Full functionality on modern CPUs
	2. 4-bit Quantized: Maximum efficiency for edge devices
	3. 8-bit Quantized: Balance of performance and memory usage

	### Hardware Requirements
	- Minimum RAM: 4GB (4-bit), 8GB (8-bit), 16GB (full precision)
	- CPU: Modern multi-core processor
	- GPU: Optional, for faster inference

	## Limitations

	1. Context Window: 32K tokens (sufficient for most coding tasks)
	2. Training Data: Performance varies by programming language
	3. Code Quality: Generated code may require review and testing
	4. Deployment: Requires proper quantization for optimal mobile performance

	## Ethical Considerations

	- Generated code should be reviewed before use in production
	- The model may produce code with security vulnerabilities
	- Users are responsible for ensuring code compliance with their standards
	- Consider safety implications when using for automated code generation

	## Citation

	```bibtex
	@article{sheikh2024sheikh25coder,
	title={Sheikh-2.5-Coder: Efficient On-Device Code Generation Model},
	author={Sheikh Research Team},
	journal={arXiv preprint arXiv:YYYY.NNNNN},
	year={2024}
	}
	```

	## License

	This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.

	## Contributing

	We welcome contributions! Please see our contributing guidelines for more information on how to participate in this project.

	## Acknowledgments

	- Inspired by MiniMax-M2's efficient architecture
	- Trained on diverse, high-quality code datasets
	- Built with modern transformer optimizations
	- Community feedback and testing

	---

	For questions or support, please open an issue on our GitHub repository.