Prithvik-1's picture
Upload README.md with huggingface_hub
ca4d1fa verified
# CodeLlama Fine-Tuning for RTL Code Generation
This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation.
## πŸ“‹ Overview
This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations.
## 🎯 Features
- **CodeLlama-7B-Instruct Fine-tuning** with LoRA
- **Chat Template Format** support
- **Dataset Processing** and validation scripts
- **Training Scripts** with checkpoint resume capability
- **Inference Scripts** for testing fine-tuned models
- **Comprehensive Documentation** and guides
## πŸ“ Repository Structure
```
codellama-migration/
β”œβ”€β”€ datasets/ # Training datasets
β”‚ β”œβ”€β”€ raw/ # Original datasets
β”‚ └── processed/ # Processed and formatted datasets
β”‚ β”œβ”€β”€ split/ # Train/val/test splits (original format)
β”‚ └── split_chat_format/ # Train/val/test splits (chat format)
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ training/ # Training scripts
β”‚ β”œβ”€β”€ inference/ # Inference scripts
β”‚ └── dataset_split.py # Dataset splitting utility
β”œβ”€β”€ Documentation/ # All .md documentation files
└── Scripts/ # Utility scripts
```
## πŸš€ Quick Start
### Prerequisites
- Python 3.8+
- CUDA-capable GPU (recommended)
- HuggingFace transformers library
- PyTorch
### Installation
```bash
pip install transformers torch peft accelerate bitsandbytes
```
### Training
```bash
bash start_training_chat_format.sh
```
### Inference
```bash
python3 scripts/inference/inference_codellama.py \
--mode local \
--model-path training-outputs/codellama-fifo-v2-chat \
--base-model-path models/base-models/CodeLlama-7B-Instruct \
--prompt "Your prompt here"
```
## πŸ“Š Dataset
The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into:
- Training: 70 samples (75%)
- Validation: 9 samples (10%)
- Test: 15 samples (15%)
## πŸ“š Documentation
- **MIGRATION_PROGRESS.md** - Overall migration tracking
- **TRAINING_COMPLETE.md** - Training completion details
- **COMPARISON_REPORT.md** - Expected vs Generated comparison
- **FILE_INVENTORY.md** - Complete file listing
## πŸ€– Model Information
**Base Model**: CodeLlama-7B-Instruct
**Fine-tuning Method**: LoRA (Low-Rank Adaptation)
**LoRA Rank**: 48
**LoRA Alpha**: 96
**Trainable Parameters**: ~120M (3.31% of total)
## πŸ“ License
This project is for internal use by Elinnos Systems Pvt Limited.
## πŸ‘₯ Contributors
Elinnos Systems Pvt Limited
## πŸ”— Links
- Organization: https://huggingface.co/Elinnos
- Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf
---
**Note**: Model weights are not included in this repository. Fine-tuned models are stored separately.