Prithvik-1 commited on
Commit
ca4d1fa
Β·
verified Β·
1 Parent(s): 51c7198

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CodeLlama Fine-Tuning for RTL Code Generation
2
+
3
+ This repository contains scripts, datasets, and documentation for fine-tuning CodeLlama-7B-Instruct model for Verilog/SystemVerilog RTL code generation.
4
+
5
+ ## πŸ“‹ Overview
6
+
7
+ This project fine-tunes CodeLlama-7B-Instruct to generate synthesizable Verilog/SystemVerilog code for hardware design tasks, specifically focusing on FIFO implementations.
8
+
9
+ ## 🎯 Features
10
+
11
+ - **CodeLlama-7B-Instruct Fine-tuning** with LoRA
12
+ - **Chat Template Format** support
13
+ - **Dataset Processing** and validation scripts
14
+ - **Training Scripts** with checkpoint resume capability
15
+ - **Inference Scripts** for testing fine-tuned models
16
+ - **Comprehensive Documentation** and guides
17
+
18
+ ## πŸ“ Repository Structure
19
+
20
+ ```
21
+ codellama-migration/
22
+ β”œβ”€β”€ datasets/ # Training datasets
23
+ β”‚ β”œβ”€β”€ raw/ # Original datasets
24
+ β”‚ └── processed/ # Processed and formatted datasets
25
+ β”‚ β”œβ”€β”€ split/ # Train/val/test splits (original format)
26
+ β”‚ └── split_chat_format/ # Train/val/test splits (chat format)
27
+ β”œβ”€β”€ scripts/
28
+ β”‚ β”œβ”€β”€ training/ # Training scripts
29
+ β”‚ β”œβ”€β”€ inference/ # Inference scripts
30
+ β”‚ └── dataset_split.py # Dataset splitting utility
31
+ β”œβ”€β”€ Documentation/ # All .md documentation files
32
+ └── Scripts/ # Utility scripts
33
+ ```
34
+
35
+ ## πŸš€ Quick Start
36
+
37
+ ### Prerequisites
38
+
39
+ - Python 3.8+
40
+ - CUDA-capable GPU (recommended)
41
+ - HuggingFace transformers library
42
+ - PyTorch
43
+
44
+ ### Installation
45
+
46
+ ```bash
47
+ pip install transformers torch peft accelerate bitsandbytes
48
+ ```
49
+
50
+ ### Training
51
+
52
+ ```bash
53
+ bash start_training_chat_format.sh
54
+ ```
55
+
56
+ ### Inference
57
+
58
+ ```bash
59
+ python3 scripts/inference/inference_codellama.py \
60
+ --mode local \
61
+ --model-path training-outputs/codellama-fifo-v2-chat \
62
+ --base-model-path models/base-models/CodeLlama-7B-Instruct \
63
+ --prompt "Your prompt here"
64
+ ```
65
+
66
+ ## πŸ“Š Dataset
67
+
68
+ The dataset contains 94 samples of FIFO implementations in Verilog format. It's split into:
69
+ - Training: 70 samples (75%)
70
+ - Validation: 9 samples (10%)
71
+ - Test: 15 samples (15%)
72
+
73
+ ## πŸ“š Documentation
74
+
75
+ - **MIGRATION_PROGRESS.md** - Overall migration tracking
76
+ - **TRAINING_COMPLETE.md** - Training completion details
77
+ - **COMPARISON_REPORT.md** - Expected vs Generated comparison
78
+ - **FILE_INVENTORY.md** - Complete file listing
79
+
80
+ ## πŸ€– Model Information
81
+
82
+ **Base Model**: CodeLlama-7B-Instruct
83
+ **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
84
+ **LoRA Rank**: 48
85
+ **LoRA Alpha**: 96
86
+ **Trainable Parameters**: ~120M (3.31% of total)
87
+
88
+ ## πŸ“ License
89
+
90
+ This project is for internal use by Elinnos Systems Pvt Limited.
91
+
92
+ ## πŸ‘₯ Contributors
93
+
94
+ Elinnos Systems Pvt Limited
95
+
96
+ ## πŸ”— Links
97
+
98
+ - Organization: https://huggingface.co/Elinnos
99
+ - Base Model: https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf
100
+
101
+ ---
102
+
103
+ **Note**: Model weights are not included in this repository. Fine-tuned models are stored separately.
104
+