pedrodev2026
/

microcoder-1.5b

Text Generation

Model card Files Files and versions

microcoder-1.5b / README.md

pedrodev2026's picture

Update README.md

fea7a04 verified 6 days ago

|

history blame contribute delete

3.35 kB

	---
	license: bsd-3-clause
	datasets:
	- pedrodev2026/microcoder-dataset-1024-tokens
	base_model:
	- unsloth/Qwen2.5-Coder-1.5B-Instruct
	pipeline_tag: text-generation
	tags:
	- coder
	- code
	- microcoder
	---
	# Microcoder 1.5B

	Microcoder 1.5B is a code-focused language model fine-tuned from [Qwen 2.5 Coder 1.5B Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) using LoRA (Low-Rank Adaptation) on curated code datasets. It is designed for code generation, completion, and instruction-following tasks in a lightweight, efficient package.

	---

	## Model Details

	\| Property \| Value \|
	\|------------------\|--------------------------------------------\|
	\| Base Model \| Qwen 2.5 Coder 1.5B Instruct \|
	\| Fine-tuning \| LoRA \|
	\| Parameters \| ~1.5B \|
	\| License \| BSD 3-Clause \|
	\| Language \| English (primary), multilingual code \|
	\| Task \| Code generation, completion, instruction following \|

	---

	## Benchmarks

	\| Benchmark \| Metric \| Score \|
	\|--------------------\|----------\|--------------\|
	\| HumanEval \| pass@1 \| 59.15% \|
	\| MBPP+ \| pass@1 \| 52.91% \|
	> HumanEval and MBPP+ results were obtained using the model in GGUF format with Q5_K_M quantization. Results may vary slightly with other formats or quantization levels.

	---

	## Usage

	> Important: You must use `apply_chat_template` when formatting inputs. Passing raw text directly to the tokenizer will produce incorrect results.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "your-org/microcoder-1.5b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	messages = [
	{
	"role": "user",
	"content": "Write a Python function that returns the nth Fibonacci number."
	}
	]

	input_text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## Training Details

	Microcoder 1.5B was fine-tuned using LoRA on top of Qwen 2.5 Coder 1.5B Instruct. The training focused on code-heavy datasets covering multiple programming languages and problem-solving scenarios, aiming to improve instruction-following and code correctness at a small model scale.

	---

	## Credits

	- Model credits — see [`MODEL_CREDITS.md`](./MODEL_CREDITS.md)
	- Dataset credits — see [`DATASET_CREDITS.md`](./DATASET_CREDITS.md)

	---

	## License

	The Microcoder 1.5B model weights and associated code in this repository are released under the BSD 3-Clause License. See [`LICENSE`](./LICENSE) for details.

	Note that the base model (Qwen 2.5 Coder 1.5B Instruct) and the datasets used for fine-tuning are subject to their own respective licenses, as detailed in the credit files above.

	---

	## Notice

	The documentation files in this repository (including `README.md`, `MODEL_CREDITS.md`, `DATASET_CREDITS.md`, and other `.md` files) were generated with the assistance of an AI language model.