--- license: bsd-3-clause datasets: - pedrodev2026/microcoder-dataset-1024-tokens base_model: - unsloth/Qwen2.5-Coder-1.5B-Instruct pipeline_tag: text-generation tags: - coder - code - microcoder --- # Microcoder 1.5B **Microcoder 1.5B** is a code-focused language model fine-tuned from [Qwen 2.5 Coder 1.5B Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) using LoRA (Low-Rank Adaptation) on curated code datasets. It is designed for code generation, completion, and instruction-following tasks in a lightweight, efficient package. --- ## Model Details | Property | Value | |------------------|--------------------------------------------| | **Base Model** | Qwen 2.5 Coder 1.5B Instruct | | **Fine-tuning** | LoRA | | **Parameters** | ~1.5B | | **License** | BSD 3-Clause | | **Language** | English (primary), multilingual code | | **Task** | Code generation, completion, instruction following | --- ## Benchmarks | Benchmark | Metric | Score | |--------------------|----------|--------------| | HumanEval | pass@1 | **59.15%** | | MBPP+ | pass@1 | **52.91%** | > HumanEval and MBPP+ results were obtained using the model in **GGUF format** with **Q5_K_M quantization**. Results may vary slightly with other formats or quantization levels. --- ## Usage > **Important:** You must use `apply_chat_template` when formatting inputs. Passing raw text directly to the tokenizer will produce incorrect results. ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "your-org/microcoder-1.5b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) messages = [ { "role": "user", "content": "Write a Python function that returns the nth Fibonacci number." } ] input_text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Training Details Microcoder 1.5B was fine-tuned using LoRA on top of Qwen 2.5 Coder 1.5B Instruct. The training focused on code-heavy datasets covering multiple programming languages and problem-solving scenarios, aiming to improve instruction-following and code correctness at a small model scale. --- ## Credits - **Model credits** — see [`MODEL_CREDITS.md`](./MODEL_CREDITS.md) - **Dataset credits** — see [`DATASET_CREDITS.md`](./DATASET_CREDITS.md) --- ## License The Microcoder 1.5B model weights and associated code in this repository are released under the **BSD 3-Clause License**. See [`LICENSE`](./LICENSE) for details. Note that the base model (Qwen 2.5 Coder 1.5B Instruct) and the datasets used for fine-tuning are subject to their own respective licenses, as detailed in the credit files above. --- ## Notice The documentation files in this repository (including `README.md`, `MODEL_CREDITS.md`, `DATASET_CREDITS.md`, and other `.md` files) were generated with the assistance of an AI language model.