|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- sahil2801/CodeAlpaca-20k |
|
|
- TokenBender/code_instructions_122k_alpaca_style |
|
|
base_model: |
|
|
- mistralai/Mistral-7B-Instruct-v0.3 |
|
|
tags: |
|
|
- code |
|
|
- python |
|
|
- sql |
|
|
- data-science |
|
|
--- |
|
|
|
|
|
# Code Specialist 7B |
|
|
|
|
|
<p align="left"> |
|
|
<a href="https://huggingface.co/Ricardouchub/code-specialist-7b"> |
|
|
<img src="https://img.shields.io/badge/HuggingFace-Code_Specialist_7B-FFD21E?style=flat-square&logo=huggingface&logoColor=black" alt="Hugging Face"/> |
|
|
</a> |
|
|
<a href="https://www.python.org/"> |
|
|
<img src="https://img.shields.io/badge/Python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"/> |
|
|
</a> |
|
|
<a href="https://huggingface.co/docs/transformers"> |
|
|
<img src="https://img.shields.io/badge/Transformers-4.56+-purple?style=flat-square&logo=huggingface&logoColor=white" alt="Transformers"/> |
|
|
</a> |
|
|
<a href="https://github.com/Ricardouchub"> |
|
|
<img src="https://img.shields.io/badge/Author-Ricardo_Urdaneta-000000?style=flat-square&logo=github&logoColor=white" alt="Author"/> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
--- |
|
|
|
|
|
## Description |
|
|
|
|
|
**Code Specialist 7B** is a fine-tuned version of **Mistral-7B-Instruct-v0.3**, trained through **Supervised Fine-Tuning (SFT)** using datasets focused on **Python and SQL**. |
|
|
The goal of this training was to enhance the model’s performance in **data analysis, programming problem-solving, and technical reasoning**. |
|
|
|
|
|
The model preserves the **7B parameter Transformer decoder-only** architecture while introducing a code-oriented fine-tuning, resulting in improved robustness for function generation, SQL queries, and technical answers. |
|
|
|
|
|
--- |
|
|
|
|
|
## Base Model |
|
|
|
|
|
- [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) |
|
|
- Architecture: Transformer (decoder-only) |
|
|
- Parameters: ~7B |
|
|
|
|
|
--- |
|
|
|
|
|
## Datasets Used for SFT |
|
|
|
|
|
- [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) |
|
|
- [Code Instructions 122k (Alpaca-style)](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style) |
|
|
|
|
|
Both datasets were **filtered to include only Python and SQL examples**, following **Alpaca/Mistral-style** instruction formatting. |
|
|
|
|
|
Example prompt format: |
|
|
|
|
|
``` |
|
|
[INST] Write a Python function that adds two numbers. [/INST] |
|
|
def add(a, b): |
|
|
return a + b |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
| **Aspect** | **Detail** | |
|
|
|--------------------|-------------| |
|
|
| **Method** | QLoRA with final weight merge | |
|
|
| **Frameworks** | `transformers`, `trl`, `peft`, `bitsandbytes` | |
|
|
| **Hardware** | GPU with 12 GB VRAM (4-bit quantization for training) | |
|
|
|
|
|
### Main Hyperparameters |
|
|
|
|
|
| **Parameter** | **Value** | |
|
|
|----------------|-----------| |
|
|
| `per_device_train_batch_size` | 2 | |
|
|
| `gradient_accumulation_steps` | 4 | |
|
|
| `learning_rate` | 2e-4 | |
|
|
| `num_train_epochs` | 1 | |
|
|
| `max_seq_length` | 1024 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_id = "Ricardouchub/Code-Specialist-7B" |
|
|
tok = AutoTokenizer.from_pretrained(model_id) |
|
|
mdl = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") |
|
|
|
|
|
prompt = "[INST] Write a Python function that calculates the average of a list. [/INST]" |
|
|
inputs = tok(prompt, return_tensors="pt").to(mdl.device) |
|
|
|
|
|
out = mdl.generate(**inputs, max_new_tokens=256) |
|
|
print(tok.decode(out[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Initial Benchmarks |
|
|
|
|
|
- **Simple evaluation (Python tasks):** Improved results on small programming and data-related tasks, including **data analysis, SQL query generation, and Python snippets**, compared to the base model. |
|
|
- Further evaluation on **HumanEval** or **MBPP** is recommended for reproducible metrics. |
|
|
|
|
|
--- |
|
|
|
|
|
## Author |
|
|
|
|
|
**Ricardo Urdaneta** |
|
|
- [LinkedIn](https://www.linkedin.com/in/ricardourdanetacastro/) |
|
|
- [GitHub](https://github.com/Ricardouchub) |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- The model does **not guarantee 100% accuracy** on complex programming tasks. |
|
|
- It may produce inconsistent results for ambiguous or incomplete prompts. |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the same license as **Mistral-7B-Instruct-v0.3** — **MIT License**. |