Utkarsh524 commited on
Commit
73a076e
·
verified ·
1 Parent(s): 6d3e173

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -94
README.md CHANGED
@@ -14,101 +14,26 @@ base_model: codellama/CodeLLaMA-7b-hf
14
  model_type: llama
15
  pipeline_tag: text-generation
16
  ---
17
- # 🧪 CodeLLaMA Optimized Unit Test Generator (v10)
18
 
19
- This repository hosts a **merged, instruction-tuned CodeLLaMA-7B model** that generates robust, production-grade unit tests
20
- for C/C++ functions, especially in embedded systems. The model merges the base
21
- [codellama/CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLLaMA-7b-hf)
22
- with a LoRA adapter trained on a cleaned dataset of embedded code tests.
23
 
24
- ---
25
-
26
- ## 🔖 Prompt Schema
27
-
28
- <|system|>
29
- Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
30
- - Testing all functions and edge cases
31
- - Avoiding redundant headers
32
- - Covering boundary conditions and error scenarios
33
- - Using clear test names without repetitions
34
- Generate ONLY test logic without framework-specific macros.
35
-
36
- <|user|>
37
- Generate unit tests for:
38
- {your C/C++ function here}
39
 
40
- <|assistant|>
41
 
42
- ---
43
-
44
- ## 🚀 Quick Inference Example
45
- ```python
46
- from transformers import AutoTokenizer, AutoModelForCausalLM
47
- import torch
48
-
49
- model_id = "Utkarsh524/codellama_utests_full_new_ver10"
50
- tokenizer = AutoTokenizer.from_pretrained(model_id)
51
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
52
-
53
- prompt = f"""<|system|>
54
- Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
55
- - Testing all functions and edge cases
56
- - Avoiding redundant headers
57
- - Covering boundary conditions and error scenarios
58
- - Using clear test names without repetitions
59
- Generate ONLY test logic without framework-specific macros.
60
-
61
- <|user|>
62
- Generate unit tests for:
63
- int add(int a, int b) {{ return a + b; }}
64
-
65
- <|assistant|>
66
- """
67
-
68
- inputs = tokenizer(
69
- prompt,
70
- return_tensors="pt",
71
- padding=True,
72
- truncation=True,
73
- max_length=8192
74
- ).to("cuda")
75
-
76
- outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, top_p=0.9)
77
- print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("<|assistant|>")[-1].strip())
78
-
79
- ```
80
-
81
- license: apache-2.0
82
- language: c++
83
- tags:
84
- - code-generation
85
- - codellama
86
- - peft
87
- - unit-tests
88
- - causal-lm
89
- - text-generation
90
- - embedded-systems
91
- base_model: codellama/CodeLlama-7b-hf
92
- model_type: llama
93
- pipeline_tag: text-generation
94
- ---
95
-
96
- # 🚀 CodeLlama Embedded Test Generator (v10)
97
-
98
- This model generates **production-grade unit tests for embedded C/C++ code**. It's a merged adapter of CodeLlama-7B fine-tuned with:
99
  - 8-bit quantization
100
  - Flash Attention 2
101
  - Linear RoPE scaling (factor=2.0)
102
  - Custom instruction tuning on embedded unit tests
103
 
104
- ## 🧠 Key Features
105
  - Generates framework-agnostic test cases
106
  - Optimized for embedded systems constraints
107
  - Strict output formatting (no boilerplate)
108
  - Special tokens for structured prompting
109
  - 8192 context window support
110
 
111
- ## ⚙️ Technical Specifications
112
  | **Component** | **Configuration** |
113
  |-------------------------|-------------------------------------------|
114
  | **Base Model** | CodeLlama-7B-HF |
@@ -122,15 +47,12 @@ This model generates **production-grade unit tests for embedded C/C++ code**. It
122
  | **Optimizer** | Paged AdamW 8-bit |
123
 
124
  ## 🧪 Prompt Structure
 
125
  <|system|>
126
  Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
127
-
128
  Testing all functions and edge cases
129
-
130
  Avoiding redundant headers
131
-
132
  Covering boundary conditions and error scenarios
133
-
134
  Using clear test names without repetitions
135
  Generate ONLY test logic without framework-specific macros.
136
 
@@ -191,7 +113,7 @@ print(generate_tests("int add(int a, int b) { return a + b; }"))
191
 
192
  ```
193
 
194
- ## 📊 Training Details
195
  ### Dataset
196
  - **Source**: `athrv/Embedded_Unittest2`
197
  - **Processing**:
@@ -201,14 +123,12 @@ print(generate_tests("int add(int a, int b) { return a + b; }"))
201
 
202
  ### LoRA Configuration
203
  LoraConfig(
204
- r=64,
205
- lora_alpha=32,
206
- target_modules=[
207
- "q_proj", "v_proj", "k_proj", "o_proj",
208
- "gate_proj", "up_proj", "down_proj" # All linear layers
209
- ],
210
- lora_dropout=0.1,
211
- task_type="CAUSAL_LM"
212
  )
213
 
214
 
@@ -226,7 +146,7 @@ base_model.resize_token_embeddings(len(tokenizer))
226
 
227
 
228
 
229
- ## 💡 Optimization Tips
230
  1. **Hardware**: Use GPUs with >24GB VRAM (A10/A100 recommended)
231
  2. **Inference**:
232
  - Temperature: 0.2-0.4
 
14
  model_type: llama
15
  pipeline_tag: text-generation
16
  ---
 
17
 
 
 
 
 
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ # CodeLlama Embedded Test Generator (v10)
21
 
22
+ This model generates **production-grade unit tests for embedded C/C++ code**.
23
+ It's a merged adapter of CodeLlama-7B fine-tuned with:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  - 8-bit quantization
25
  - Flash Attention 2
26
  - Linear RoPE scaling (factor=2.0)
27
  - Custom instruction tuning on embedded unit tests
28
 
29
+ ## Key Features
30
  - Generates framework-agnostic test cases
31
  - Optimized for embedded systems constraints
32
  - Strict output formatting (no boilerplate)
33
  - Special tokens for structured prompting
34
  - 8192 context window support
35
 
36
+ ## Technical Specifications
37
  | **Component** | **Configuration** |
38
  |-------------------------|-------------------------------------------|
39
  | **Base Model** | CodeLlama-7B-HF |
 
47
  | **Optimizer** | Paged AdamW 8-bit |
48
 
49
  ## 🧪 Prompt Structure
50
+
51
  <|system|>
52
  Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
 
53
  Testing all functions and edge cases
 
54
  Avoiding redundant headers
 
55
  Covering boundary conditions and error scenarios
 
56
  Using clear test names without repetitions
57
  Generate ONLY test logic without framework-specific macros.
58
 
 
113
 
114
  ```
115
 
116
+ ## Training Details
117
  ### Dataset
118
  - **Source**: `athrv/Embedded_Unittest2`
119
  - **Processing**:
 
123
 
124
  ### LoRA Configuration
125
  LoraConfig(
126
+ -r=64,
127
+ -lora_alpha=32,
128
+ -target_modules=[
129
+ -"q_proj", "v_proj", "k_proj", "o_proj","gate_proj", "up_proj", "down_proj" # All linear layers],
130
+ -lora_dropout=0.1,
131
+ -task_type="CAUSAL_LM"
 
 
132
  )
133
 
134
 
 
146
 
147
 
148
 
149
+ ## Optimization Tips
150
  1. **Hardware**: Use GPUs with >24GB VRAM (A10/A100 recommended)
151
  2. **Inference**:
152
  - Temperature: 0.2-0.4