File size: 3,017 Bytes
ca4d1fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# CodeLlama Fine-Tuning for RTL Code Generation

This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation.

## πŸ“‹ Overview

This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations.

## 🎯 Features

- **CodeLlama-7B-Instruct Fine-tuning** with LoRA
- **Chat Template Format** support
- **Dataset Processing** and validation scripts
- **Training Scripts** with checkpoint resume capability
- **Inference Scripts** for testing fine-tuned models
- **Comprehensive Documentation** and guides

## πŸ“ Repository Structure

```
codellama-migration/
β”œβ”€β”€ datasets/                    # Training datasets
β”‚   β”œβ”€β”€ raw/                    # Original datasets
β”‚   └── processed/              # Processed and formatted datasets
β”‚       β”œβ”€β”€ split/              # Train/val/test splits (original format)
β”‚       └── split_chat_format/  # Train/val/test splits (chat format)
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ training/               # Training scripts
β”‚   β”œβ”€β”€ inference/              # Inference scripts
β”‚   └── dataset_split.py        # Dataset splitting utility
β”œβ”€β”€ Documentation/              # All .md documentation files
└── Scripts/                    # Utility scripts
```

## πŸš€ Quick Start

### Prerequisites

- Python 3.8+
- CUDA-capable GPU (recommended)
- HuggingFace transformers library
- PyTorch

### Installation

```bash
pip install transformers torch peft accelerate bitsandbytes
```

### Training

```bash
bash start_training_chat_format.sh
```

### Inference

```bash
python3 scripts/inference/inference_codellama.py \
    --mode local \
    --model-path training-outputs/codellama-fifo-v2-chat \
    --base-model-path models/base-models/CodeLlama-7B-Instruct \
    --prompt "Your prompt here"
```

## πŸ“Š Dataset

The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into:
- Training: 70 samples (75%)
- Validation: 9 samples (10%)
- Test: 15 samples (15%)

## πŸ“š Documentation

- **MIGRATION_PROGRESS.md** - Overall migration tracking
- **TRAINING_COMPLETE.md** - Training completion details
- **COMPARISON_REPORT.md** - Expected vs Generated comparison
- **FILE_INVENTORY.md** - Complete file listing

## πŸ€– Model Information

**Base Model**: CodeLlama-7B-Instruct  
**Fine-tuning Method**: LoRA (Low-Rank Adaptation)  
**LoRA Rank**: 48  
**LoRA Alpha**: 96  
**Trainable Parameters**: ~120M (3.31% of total)

## πŸ“ License

This project is for internal use by Elinnos Systems Pvt Limited.

## πŸ‘₯ Contributors

Elinnos Systems Pvt Limited

## πŸ”— Links

- Organization: https://huggingface.co/Elinnos
- Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf

---

**Note**: Model weights are not included in this repository. Fine-tuned models are stored separately.