|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: llama2 |
|
|
tags: |
|
|
- code |
|
|
- llama2 |
|
|
- full-fine-tuning |
|
|
- mask-fine-tuning |
|
|
- coding |
|
|
datasets: |
|
|
- tulu3_persona_python |
|
|
- evol_code |
|
|
- code_alpaca |
|
|
base_model: meta-llama/Llama-2-7b-hf |
|
|
--- |
|
|
|
|
|
# llama2-7b-coding-fft |
|
|
|
|
|
This model is a **Full Fine-Tuned (FFT)** version of LLaMA2-7B on coding datasets, trained as part of replicating the [Mask Fine-Tuning (MFT) paper](https://arxiv.org/abs/2503.22764v1). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model:** meta-llama/Llama-2-7b-hf |
|
|
- **Training Type:** Full Fine-Tuning (FFT) |
|
|
- **Domain:** Coding |
|
|
- **Hardware:** TPU v4-8 |
|
|
- **Training Framework:** PyTorch + torch_xla |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on 30,000 samples from three coding datasets (matching the paper): |
|
|
- **Tulu 3 Persona Python:** 10,000 samples |
|
|
- **Evol CodeAlpaca:** 10,000 samples |
|
|
- **Code-Alpaca:** 10,000 samples |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
- **Epochs:** 2 |
|
|
- **Sequence Length:** 4096 |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Batch Size:** 8 (effective) |
|
|
- **Optimizer:** AdamW |
|
|
- **LR Scheduler:** Linear with warmup |
|
|
- **Mixed Precision:** bfloat16 |
|
|
|
|
|
## Training Results |
|
|
|
|
|
- **Final Loss:** 0.15353151041666666 |
|
|
- **Final Perplexity:** 1.1673020833333334 |
|
|
- **Training Time:** ~7 hours on TPU v4-8 |
|
|
- **Total Steps:** 7500 |
|
|
|
|
|
### Loss Progression |
|
|
- Epoch 0: 0.42591484375 |
|
|
- Epoch 1: 0.15353151041666666 |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model serves as the **FFT baseline** for the Mask Fine-Tuning paper replication. It will be evaluated on: |
|
|
- **HumanEval** (code generation benchmark) |
|
|
- Target: Match paper's FFT baseline of 29.3% |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Evaluation on HumanEval is pending. Results will be updated here once available. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original MFT paper: |
|
|
|
|
|
```bibtex |
|
|
@article{mft2025, |
|
|
title={Mask Fine-Tuning}, |
|
|
author={[Authors from paper]}, |
|
|
journal={arXiv preprint arXiv:2503.22764v1}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Reproducibility |
|
|
|
|
|
Training configuration and code available at: [GitHub Repository](https://github.com/chrisfrancisque/mft-tpu) |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the LLaMA 2 Community License from the base model. |
|
|
|