Update README.md
Browse files
README.md
CHANGED
|
@@ -14,101 +14,26 @@ base_model: codellama/CodeLLaMA-7b-hf
|
|
| 14 |
model_type: llama
|
| 15 |
pipeline_tag: text-generation
|
| 16 |
---
|
| 17 |
-
# 🧪 CodeLLaMA Optimized Unit Test Generator (v10)
|
| 18 |
|
| 19 |
-
This repository hosts a **merged, instruction-tuned CodeLLaMA-7B model** that generates robust, production-grade unit tests
|
| 20 |
-
for C/C++ functions, especially in embedded systems. The model merges the base
|
| 21 |
-
[codellama/CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLLaMA-7b-hf)
|
| 22 |
-
with a LoRA adapter trained on a cleaned dataset of embedded code tests.
|
| 23 |
|
| 24 |
-
---
|
| 25 |
-
|
| 26 |
-
## 🔖 Prompt Schema
|
| 27 |
-
|
| 28 |
-
<|system|>
|
| 29 |
-
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
|
| 30 |
-
- Testing all functions and edge cases
|
| 31 |
-
- Avoiding redundant headers
|
| 32 |
-
- Covering boundary conditions and error scenarios
|
| 33 |
-
- Using clear test names without repetitions
|
| 34 |
-
Generate ONLY test logic without framework-specific macros.
|
| 35 |
-
|
| 36 |
-
<|user|>
|
| 37 |
-
Generate unit tests for:
|
| 38 |
-
{your C/C++ function here}
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
-
|
| 43 |
-
|
| 44 |
-
## 🚀 Quick Inference Example
|
| 45 |
-
```python
|
| 46 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 47 |
-
import torch
|
| 48 |
-
|
| 49 |
-
model_id = "Utkarsh524/codellama_utests_full_new_ver10"
|
| 50 |
-
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 51 |
-
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
|
| 52 |
-
|
| 53 |
-
prompt = f"""<|system|>
|
| 54 |
-
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
|
| 55 |
-
- Testing all functions and edge cases
|
| 56 |
-
- Avoiding redundant headers
|
| 57 |
-
- Covering boundary conditions and error scenarios
|
| 58 |
-
- Using clear test names without repetitions
|
| 59 |
-
Generate ONLY test logic without framework-specific macros.
|
| 60 |
-
|
| 61 |
-
<|user|>
|
| 62 |
-
Generate unit tests for:
|
| 63 |
-
int add(int a, int b) {{ return a + b; }}
|
| 64 |
-
|
| 65 |
-
<|assistant|>
|
| 66 |
-
"""
|
| 67 |
-
|
| 68 |
-
inputs = tokenizer(
|
| 69 |
-
prompt,
|
| 70 |
-
return_tensors="pt",
|
| 71 |
-
padding=True,
|
| 72 |
-
truncation=True,
|
| 73 |
-
max_length=8192
|
| 74 |
-
).to("cuda")
|
| 75 |
-
|
| 76 |
-
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, top_p=0.9)
|
| 77 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("<|assistant|>")[-1].strip())
|
| 78 |
-
|
| 79 |
-
```
|
| 80 |
-
|
| 81 |
-
license: apache-2.0
|
| 82 |
-
language: c++
|
| 83 |
-
tags:
|
| 84 |
-
- code-generation
|
| 85 |
-
- codellama
|
| 86 |
-
- peft
|
| 87 |
-
- unit-tests
|
| 88 |
-
- causal-lm
|
| 89 |
-
- text-generation
|
| 90 |
-
- embedded-systems
|
| 91 |
-
base_model: codellama/CodeLlama-7b-hf
|
| 92 |
-
model_type: llama
|
| 93 |
-
pipeline_tag: text-generation
|
| 94 |
-
---
|
| 95 |
-
|
| 96 |
-
# 🚀 CodeLlama Embedded Test Generator (v10)
|
| 97 |
-
|
| 98 |
-
This model generates **production-grade unit tests for embedded C/C++ code**. It's a merged adapter of CodeLlama-7B fine-tuned with:
|
| 99 |
- 8-bit quantization
|
| 100 |
- Flash Attention 2
|
| 101 |
- Linear RoPE scaling (factor=2.0)
|
| 102 |
- Custom instruction tuning on embedded unit tests
|
| 103 |
|
| 104 |
-
##
|
| 105 |
- Generates framework-agnostic test cases
|
| 106 |
- Optimized for embedded systems constraints
|
| 107 |
- Strict output formatting (no boilerplate)
|
| 108 |
- Special tokens for structured prompting
|
| 109 |
- 8192 context window support
|
| 110 |
|
| 111 |
-
##
|
| 112 |
| **Component** | **Configuration** |
|
| 113 |
|-------------------------|-------------------------------------------|
|
| 114 |
| **Base Model** | CodeLlama-7B-HF |
|
|
@@ -122,15 +47,12 @@ This model generates **production-grade unit tests for embedded C/C++ code**. It
|
|
| 122 |
| **Optimizer** | Paged AdamW 8-bit |
|
| 123 |
|
| 124 |
## 🧪 Prompt Structure
|
|
|
|
| 125 |
<|system|>
|
| 126 |
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
|
| 127 |
-
|
| 128 |
Testing all functions and edge cases
|
| 129 |
-
|
| 130 |
Avoiding redundant headers
|
| 131 |
-
|
| 132 |
Covering boundary conditions and error scenarios
|
| 133 |
-
|
| 134 |
Using clear test names without repetitions
|
| 135 |
Generate ONLY test logic without framework-specific macros.
|
| 136 |
|
|
@@ -191,7 +113,7 @@ print(generate_tests("int add(int a, int b) { return a + b; }"))
|
|
| 191 |
|
| 192 |
```
|
| 193 |
|
| 194 |
-
##
|
| 195 |
### Dataset
|
| 196 |
- **Source**: `athrv/Embedded_Unittest2`
|
| 197 |
- **Processing**:
|
|
@@ -201,14 +123,12 @@ print(generate_tests("int add(int a, int b) { return a + b; }"))
|
|
| 201 |
|
| 202 |
### LoRA Configuration
|
| 203 |
LoraConfig(
|
| 204 |
-
r=64,
|
| 205 |
-
lora_alpha=32,
|
| 206 |
-
target_modules=[
|
| 207 |
-
"q_proj", "v_proj", "k_proj", "o_proj",
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
lora_dropout=0.1,
|
| 211 |
-
task_type="CAUSAL_LM"
|
| 212 |
)
|
| 213 |
|
| 214 |
|
|
@@ -226,7 +146,7 @@ base_model.resize_token_embeddings(len(tokenizer))
|
|
| 226 |
|
| 227 |
|
| 228 |
|
| 229 |
-
##
|
| 230 |
1. **Hardware**: Use GPUs with >24GB VRAM (A10/A100 recommended)
|
| 231 |
2. **Inference**:
|
| 232 |
- Temperature: 0.2-0.4
|
|
|
|
| 14 |
model_type: llama
|
| 15 |
pipeline_tag: text-generation
|
| 16 |
---
|
|
|
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
# CodeLlama Embedded Test Generator (v10)
|
| 21 |
|
| 22 |
+
This model generates **production-grade unit tests for embedded C/C++ code**.
|
| 23 |
+
It's a merged adapter of CodeLlama-7B fine-tuned with:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
- 8-bit quantization
|
| 25 |
- Flash Attention 2
|
| 26 |
- Linear RoPE scaling (factor=2.0)
|
| 27 |
- Custom instruction tuning on embedded unit tests
|
| 28 |
|
| 29 |
+
## Key Features
|
| 30 |
- Generates framework-agnostic test cases
|
| 31 |
- Optimized for embedded systems constraints
|
| 32 |
- Strict output formatting (no boilerplate)
|
| 33 |
- Special tokens for structured prompting
|
| 34 |
- 8192 context window support
|
| 35 |
|
| 36 |
+
## Technical Specifications
|
| 37 |
| **Component** | **Configuration** |
|
| 38 |
|-------------------------|-------------------------------------------|
|
| 39 |
| **Base Model** | CodeLlama-7B-HF |
|
|
|
|
| 47 |
| **Optimizer** | Paged AdamW 8-bit |
|
| 48 |
|
| 49 |
## 🧪 Prompt Structure
|
| 50 |
+
|
| 51 |
<|system|>
|
| 52 |
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
|
|
|
|
| 53 |
Testing all functions and edge cases
|
|
|
|
| 54 |
Avoiding redundant headers
|
|
|
|
| 55 |
Covering boundary conditions and error scenarios
|
|
|
|
| 56 |
Using clear test names without repetitions
|
| 57 |
Generate ONLY test logic without framework-specific macros.
|
| 58 |
|
|
|
|
| 113 |
|
| 114 |
```
|
| 115 |
|
| 116 |
+
## Training Details
|
| 117 |
### Dataset
|
| 118 |
- **Source**: `athrv/Embedded_Unittest2`
|
| 119 |
- **Processing**:
|
|
|
|
| 123 |
|
| 124 |
### LoRA Configuration
|
| 125 |
LoraConfig(
|
| 126 |
+
-r=64,
|
| 127 |
+
-lora_alpha=32,
|
| 128 |
+
-target_modules=[
|
| 129 |
+
-"q_proj", "v_proj", "k_proj", "o_proj","gate_proj", "up_proj", "down_proj" # All linear layers],
|
| 130 |
+
-lora_dropout=0.1,
|
| 131 |
+
-task_type="CAUSAL_LM"
|
|
|
|
|
|
|
| 132 |
)
|
| 133 |
|
| 134 |
|
|
|
|
| 146 |
|
| 147 |
|
| 148 |
|
| 149 |
+
## Optimization Tips
|
| 150 |
1. **Hardware**: Use GPUs with >24GB VRAM (A10/A100 recommended)
|
| 151 |
2. **Inference**:
|
| 152 |
- Temperature: 0.2-0.4
|