Qwen-1.5-1.8b-PythonCOT-coder / README.md

Update README.md

c555cb0 verified 2 months ago

7.7 kB

	---
	base_model: Qwen/Qwen1.5-1.8B
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:Qwen/Qwen1.5-1.8B
	- lora
	- transformers
	- code-generation
	- python
	- reasoning
	- synthetic-data
	language:
	- en
	license: apache-2.0
	---

	# Qwen 1.5 1.8B - Python Code Generation with Step-by-Step Reasoning

	A fine-tuned version of Qwen 1.5 1.8B that generates Python code with detailed step-by-step reasoning explanations. This model teaches users how to solve programming problems by explaining its thought process before writing code.

	## Model Details

	### Model Description

	This model is fine-tuned using QLoRA on a synthetic dataset of 1,000 Python programming problems enriched with step-by-step reasoning. The model learns to explain its problem-solving approach before generating code, making it ideal for educational purposes and transparent code generation.

	- Developed by: [Your Name/Organization]
	- Model type: Causal Language Model (Fine-tuned with LoRA adapters)
	- Language(s): English (code generation in Python)
	- License: Apache 2.0
	- Finetuned from model: Qwen/Qwen1.5-1.8B

	### Model Sources

	- Base Model: [Qwen/Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B)
	- Training Data: Synthetic dataset generated from MBPP and CodeAlpaca using Llama 3.1 8B

	## Uses

	### Direct Use

	This model is designed for:
	- Educational code generation: Teaching programming concepts through explained solutions
	- Transparent AI coding assistants: Understanding how the model approaches problems
	- Code explanation: Generating step-by-step breakdowns of problem-solving strategies
	- Learning tool: Helping beginners understand algorithmic thinking

	### Example Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen1.5-1.8B",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-1.8B")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "[YOUR_MODEL_PATH]")

	# Generate code with reasoning
	prompt = "Write a Python function to find the longest common prefix in a list of strings."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Out-of-Scope Use

	- Production-critical systems: This model is fine-tuned on a limited dataset and should not be used for safety-critical applications
	- Non-Python languages: The model is specifically trained on Python problems
	- Complex software architecture: Best suited for algorithm-level problems, not large-scale system design
	- Security-sensitive code: Should not be used for generating cryptographic or security-critical code without expert review

	## Bias, Risks, and Limitations

	### Limitations

	1. Dataset size: Trained on only 1,000 examples, may not generalize to all problem types
	2. Teacher model quality: Synthetic data generated by Llama 3.1 8B may contain errors
	3. Small test set: Evaluated on only 7 problems, true generalization unknown
	4. Potential overfitting: High accuracy on test set may indicate memorization rather than true learning
	5. No code validation: Training data was not validated for correctness before fine-tuning

	### Recommendations

	- Always review and test generated code before using in production
	- Use as a learning tool rather than a replacement for human expertise
	- Validate outputs against test cases and edge cases
	- Consider the model's explanations as one perspective, not absolute truth

	## Training Details

	### Training Data

	- Source datasets: MBPP (Mostly Basic Programming Problems) and CodeAlpaca
	- Dataset size: 1,000 Python programming problems
	- Data generation: Synthetic step-by-step reasoning generated using Llama 3.1 8B Instant via Groq API
	- Data structure: Each example contains:
	- Original programming problem
	- Step-by-step reasoning (problem understanding, algorithm design, implementation strategy)
	- Python solution

	### Training Procedure

	#### Fine-tuning Method

	- Technique: QLoRA (Quantized Low-Rank Adaptation)
	- Quantization: 4-bit quantization for memory efficiency
	- LoRA Configuration:
	- Rank (r): 8
	- Alpha: 16
	- Target modules: q_proj, k_proj, v_proj, o_proj (attention layers)
	- Dropout: 0.05

	#### Training Hyperparameters

	- Training epochs: 3
	- Learning rate: 2e-4
	- Optimizer: paged_adamw_8bit
	- Batch size: [Specify if known]
	- Training regime: Mixed precision (4-bit quantization)
	- Hardware: Google Colab T4 GPU (free tier)
	- Framework: PEFT 0.17.1, Transformers, bitsandbytes

	#### Training Time

	- Approximately [X hours] on Google Colab T4 GPU

	## Evaluation

	### Testing Data & Metrics

	#### Testing Data

	- Test set size: 7 diverse Python programming problems
	- Problem types: Mix of algorithmic challenges from the training distribution

	#### Metrics

	- Primary metric: Pass@1 (functional correctness - does the generated code execute correctly?)
	- Secondary metric: Reasoning structure presence (does output include step-by-step explanation?)

	### Results

	\| Metric \| Base Model (Qwen 1.5 1.8B) \| Fine-tuned Model \|
	\|--------\|---------------------------\|------------------\|
	\| Pass@1 \| 75% \| 100% \|
	\| Reasoning Structure \| Inconsistent \| 100% \|

	Key Findings:
	- +25 percentage point improvement in functional correctness
	- 100% of outputs now include structured step-by-step reasoning
	- All 7 test cases passed successfully

	Important Note: Results are based on a small test set (7 examples). Larger-scale evaluation needed to confirm generalization.

	## Environmental Impact

	- Hardware Type: NVIDIA T4 GPU (Google Colab)
	- Hours used: ~[X hours for fine-tuning]
	- Cloud Provider: Google Cloud Platform
	- Compute Region: [Specify if known]
	- Carbon Emitted: Minimal due to use of QLoRA on single T4 GPU

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).

	## Technical Specifications

	### Model Architecture

	- Base architecture: Qwen 1.5 1.8B (Transformer decoder)
	- Fine-tuning method: LoRA adapters on attention layers
	- Total parameters: 1.8B (base) + ~4.7M (LoRA adapters)
	- Trainable parameters: ~4.7M (0.26% of total)

	### Compute Infrastructure

	#### Hardware

	- GPU: NVIDIA T4 (16GB VRAM)
	- Platform: Google Colab (free tier)

	#### Software

	- PEFT 0.17.1
	- Transformers
	- bitsandbytes (for 4-bit quantization)
	- PyTorch
	- Groq API (for synthetic data generation)

	## Project Insights

	### What Worked Well

	- Cross-model knowledge distillation (8B teacher → 1.8B student)
	- QLoRA enabled fine-tuning on free-tier GPU
	- Structured prompts for synthetic data generation
	- Teaching reasoning process alongside code generation

	### Future Improvements

	1. Better teacher model: Use Llama 3.1 70B for higher-quality synthetic data
	2. Data validation: Verify all generated code executes correctly before training
	3. Larger dataset: Scale to 5,000-10,000 examples
	4. Robust evaluation: Test on 50-100 problems from benchmarks like HumanEval
	5. Higher LoRA rank: Experiment with rank 16 or 32 for more capacity

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{qwen15-code-reasoning,
	author = {[Rachit Verma]},
	title = {Qwen 1.5 1.8B Fine-tuned for Python Code Generation with Reasoning},
	year = {2025},
	publisher = {HuggingFace},
	}
	```

	## Model Card Authors

	[Rachit Verma]