# CodeLlama Fine-Tuning for RTL Code Generation This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation. ## 📋 Overview This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations. ## 🎯 Features - **CodeLlama-7B-Instruct Fine-tuning** with LoRA - **Chat Template Format** support - **Dataset Processing** and validation scripts - **Training Scripts** with checkpoint resume capability - **Inference Scripts** for testing fine-tuned models - **Comprehensive Documentation** and guides ## 📁 Repository Structure ``` codellama-migration/ ├── datasets/ # Training datasets │ ├── raw/ # Original datasets │ └── processed/ # Processed and formatted datasets │ ├── split/ # Train/val/test splits (original format) │ └── split_chat_format/ # Train/val/test splits (chat format) ├── scripts/ │ ├── training/ # Training scripts │ ├── inference/ # Inference scripts │ └── dataset_split.py # Dataset splitting utility ├── Documentation/ # All .md documentation files └── Scripts/ # Utility scripts ``` ## 🚀 Quick Start ### Prerequisites - Python 3.8+ - CUDA-capable GPU (recommended) - HuggingFace transformers library - PyTorch ### Installation ```bash pip install transformers torch peft accelerate bitsandbytes ``` ### Training ```bash bash start_training_chat_format.sh ``` ### Inference ```bash python3 scripts/inference/inference_codellama.py \ --mode local \ --model-path training-outputs/codellama-fifo-v2-chat \ --base-model-path models/base-models/CodeLlama-7B-Instruct \ --prompt "Your prompt here" ``` ## 📊 Dataset The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into: - Training: 70 samples (75%) - Validation: 9 samples (10%) - Test: 15 samples (15%) ## 📚 Documentation - **MIGRATION_PROGRESS.md** - Overall migration tracking - **TRAINING_COMPLETE.md** - Training completion details - **COMPARISON_REPORT.md** - Expected vs Generated comparison - **FILE_INVENTORY.md** - Complete file listing ## 🤖 Model Information **Base Model**: CodeLlama-7B-Instruct **Fine-tuning Method**: LoRA (Low-Rank Adaptation) **LoRA Rank**: 48 **LoRA Alpha**: 96 **Trainable Parameters**: ~120M (3.31% of total) ## 📝 License This project is for internal use by Elinnos Systems Pvt Limited. ## 👥 Contributors Elinnos Systems Pvt Limited ## 🔗 Links - Organization: https://huggingface.co/Elinnos - Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf --- **Note**: Model weights are not included in this repository. Fine-tuned models are stored separately.