| # CodeLlama Fine-Tuning for RTL Code Generation | |
| This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation. | |
| ## π Overview | |
| This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations. | |
| ## π― Features | |
| - **CodeLlama-7B-Instruct Fine-tuning** with LoRA | |
| - **Chat Template Format** support | |
| - **Dataset Processing** and validation scripts | |
| - **Training Scripts** with checkpoint resume capability | |
| - **Inference Scripts** for testing fine-tuned models | |
| - **Comprehensive Documentation** and guides | |
| ## π Repository Structure | |
| ``` | |
| codellama-migration/ | |
| βββ datasets/ # Training datasets | |
| β βββ raw/ # Original datasets | |
| β βββ processed/ # Processed and formatted datasets | |
| β βββ split/ # Train/val/test splits (original format) | |
| β βββ split_chat_format/ # Train/val/test splits (chat format) | |
| βββ scripts/ | |
| β βββ training/ # Training scripts | |
| β βββ inference/ # Inference scripts | |
| β βββ dataset_split.py # Dataset splitting utility | |
| βββ Documentation/ # All .md documentation files | |
| βββ Scripts/ # Utility scripts | |
| ``` | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - CUDA-capable GPU (recommended) | |
| - HuggingFace transformers library | |
| - PyTorch | |
| ### Installation | |
| ```bash | |
| pip install transformers torch peft accelerate bitsandbytes | |
| ``` | |
| ### Training | |
| ```bash | |
| bash start_training_chat_format.sh | |
| ``` | |
| ### Inference | |
| ```bash | |
| python3 scripts/inference/inference_codellama.py \ | |
| --mode local \ | |
| --model-path training-outputs/codellama-fifo-v2-chat \ | |
| --base-model-path models/base-models/CodeLlama-7B-Instruct \ | |
| --prompt "Your prompt here" | |
| ``` | |
| ## π Dataset | |
| The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into: | |
| - Training: 70 samples (75%) | |
| - Validation: 9 samples (10%) | |
| - Test: 15 samples (15%) | |
| ## π Documentation | |
| - **MIGRATION_PROGRESS.md** - Overall migration tracking | |
| - **TRAINING_COMPLETE.md** - Training completion details | |
| - **COMPARISON_REPORT.md** - Expected vs Generated comparison | |
| - **FILE_INVENTORY.md** - Complete file listing | |
| ## π€ Model Information | |
| **Base Model**: CodeLlama-7B-Instruct | |
| **Fine-tuning Method**: LoRA (Low-Rank Adaptation) | |
| **LoRA Rank**: 48 | |
| **LoRA Alpha**: 96 | |
| **Trainable Parameters**: ~120M (3.31% of total) | |
| ## π License | |
| This project is for internal use by Elinnos Systems Pvt Limited. | |
| ## π₯ Contributors | |
| Elinnos Systems Pvt Limited | |
| ## π Links | |
| - Organization: https://huggingface.co/Elinnos | |
| - Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf | |
| --- | |
| **Note**: Model weights are not included in this repository. Fine-tuned models are stored separately. | |