File size: 3,017 Bytes
ca4d1fa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# CodeLlama Fine-Tuning for RTL Code Generation
This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation.
## π Overview
This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations.
## π― Features
- **CodeLlama-7B-Instruct Fine-tuning** with LoRA
- **Chat Template Format** support
- **Dataset Processing** and validation scripts
- **Training Scripts** with checkpoint resume capability
- **Inference Scripts** for testing fine-tuned models
- **Comprehensive Documentation** and guides
## π Repository Structure
```
codellama-migration/
βββ datasets/ # Training datasets
β βββ raw/ # Original datasets
β βββ processed/ # Processed and formatted datasets
β βββ split/ # Train/val/test splits (original format)
β βββ split_chat_format/ # Train/val/test splits (chat format)
βββ scripts/
β βββ training/ # Training scripts
β βββ inference/ # Inference scripts
β βββ dataset_split.py # Dataset splitting utility
βββ Documentation/ # All .md documentation files
βββ Scripts/ # Utility scripts
```
## π Quick Start
### Prerequisites
- Python 3.8+
- CUDA-capable GPU (recommended)
- HuggingFace transformers library
- PyTorch
### Installation
```bash
pip install transformers torch peft accelerate bitsandbytes
```
### Training
```bash
bash start_training_chat_format.sh
```
### Inference
```bash
python3 scripts/inference/inference_codellama.py \
--mode local \
--model-path training-outputs/codellama-fifo-v2-chat \
--base-model-path models/base-models/CodeLlama-7B-Instruct \
--prompt "Your prompt here"
```
## π Dataset
The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into:
- Training: 70 samples (75%)
- Validation: 9 samples (10%)
- Test: 15 samples (15%)
## π Documentation
- **MIGRATION_PROGRESS.md** - Overall migration tracking
- **TRAINING_COMPLETE.md** - Training completion details
- **COMPARISON_REPORT.md** - Expected vs Generated comparison
- **FILE_INVENTORY.md** - Complete file listing
## π€ Model Information
**Base Model**: CodeLlama-7B-Instruct
**Fine-tuning Method**: LoRA (Low-Rank Adaptation)
**LoRA Rank**: 48
**LoRA Alpha**: 96
**Trainable Parameters**: ~120M (3.31% of total)
## π License
This project is for internal use by Elinnos Systems Pvt Limited.
## π₯ Contributors
Elinnos Systems Pvt Limited
## π Links
- Organization: https://huggingface.co/Elinnos
- Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf
---
**Note**: Model weights are not included in this repository. Fine-tuned models are stored separately.
|