code-specialist-7b / README.md

Update README.md

651e888 verified 4 months ago

4.14 kB

	---
	license: mit
	datasets:
	- sahil2801/CodeAlpaca-20k
	- TokenBender/code_instructions_122k_alpaca_style
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.3
	tags:
	- code
	- python
	- sql
	- data-science
	---

	# Code Specialist 7B

	<p align="left">
	<a href="https://huggingface.co/Ricardouchub/code-specialist-7b">
	<img src="https://img.shields.io/badge/HuggingFace-Code_Specialist_7B-FFD21E?style=flat-square&logo=huggingface&logoColor=black" alt="Hugging Face"/>
	</a>
	<a href="https://www.python.org/">
	<img src="https://img.shields.io/badge/Python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"/>
	</a>
	<a href="https://huggingface.co/docs/transformers">
	<img src="https://img.shields.io/badge/Transformers-4.56+-purple?style=flat-square&logo=huggingface&logoColor=white" alt="Transformers"/>
	</a>
	<a href="https://github.com/Ricardouchub">
	<img src="https://img.shields.io/badge/Author-Ricardo_Urdaneta-000000?style=flat-square&logo=github&logoColor=white" alt="Author"/>
	</a>
	</p>

	---

	## Description

	Code Specialist 7B is a fine-tuned version of Mistral-7B-Instruct-v0.3, trained through Supervised Fine-Tuning (SFT) using datasets focused on Python and SQL.
	The goal of this training was to enhance the model’s performance in data analysis, programming problem-solving, and technical reasoning.

	The model preserves the 7B parameter Transformer decoder-only architecture while introducing a code-oriented fine-tuning, resulting in improved robustness for function generation, SQL queries, and technical answers.

	---

	## Base Model

	- [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
	- Architecture: Transformer (decoder-only)
	- Parameters: ~7B

	---

	## Datasets Used for SFT

	- [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)
	- [Code Instructions 122k (Alpaca-style)](https://huggingface.co/datasets/TokenBender/code_instructions_122k_alpaca_style)

	Both datasets were filtered to include only Python and SQL examples, following Alpaca/Mistral-style instruction formatting.

	Example prompt format:

	```
	[INST] Write a Python function that adds two numbers. [/INST]
	def add(a, b):
	return a + b
	```

	---

	## Training Details

	\| Aspect \| Detail \|
	\|--------------------\|-------------\|
	\| Method \| QLoRA with final weight merge \|
	\| Frameworks \| `transformers`, `trl`, `peft`, `bitsandbytes` \|
	\| Hardware \| GPU with 12 GB VRAM (4-bit quantization for training) \|

	### Main Hyperparameters

	\| Parameter \| Value \|
	\|----------------\|-----------\|
	\| `per_device_train_batch_size` \| 2 \|
	\| `gradient_accumulation_steps` \| 4 \|
	\| `learning_rate` \| 2e-4 \|
	\| `num_train_epochs` \| 1 \|
	\| `max_seq_length` \| 1024 \|

	---

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "Ricardouchub/Code-Specialist-7B"
	tok = AutoTokenizer.from_pretrained(model_id)
	mdl = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

	prompt = "[INST] Write a Python function that calculates the average of a list. [/INST]"
	inputs = tok(prompt, return_tensors="pt").to(mdl.device)

	out = mdl.generate(**inputs, max_new_tokens=256)
	print(tok.decode(out[0], skip_special_tokens=True))
	```

	---

	## Initial Benchmarks

	- Simple evaluation (Python tasks): Improved results on small programming and data-related tasks, including data analysis, SQL query generation, and Python snippets, compared to the base model.
	- Further evaluation on HumanEval or MBPP is recommended for reproducible metrics.

	---

	## Author

	Ricardo Urdaneta
	- [LinkedIn](https://www.linkedin.com/in/ricardourdanetacastro/)
	- [GitHub](https://github.com/Ricardouchub)

	---

	## Limitations

	- The model does not guarantee 100% accuracy on complex programming tasks.
	- It may produce inconsistent results for ambiguous or incomplete prompts.

	---

	## License

	This model is released under the same license as Mistral-7B-Instruct-v0.3 — MIT License.