Utkarsh524
/

codellama_utests_full_new_ver10

@@ -14,101 +14,26 @@ base_model: codellama/CodeLLaMA-7b-hf
 model_type: llama
 pipeline_tag: text-generation
 ---
-# 🧪 CodeLLaMA Optimized Unit Test Generator (v10)
-This repository hosts a **merged, instruction-tuned CodeLLaMA-7B model** that generates robust, production-grade unit tests
-for C/C++ functions, especially in embedded systems. The model merges the base
-[codellama/CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLLaMA-7b-hf)
-with a LoRA adapter trained on a cleaned dataset of embedded code tests.
----
-## 🔖 Prompt Schema
-<|system|>
-Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
-- Testing all functions and edge cases
-- Avoiding redundant headers
-- Covering boundary conditions and error scenarios
-- Using clear test names without repetitions
-Generate ONLY test logic without framework-specific macros.
-<|user|>
-Generate unit tests for:
-{your C/C++ function here}
-<|assistant|>
----
-## 🚀 Quick Inference Example
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-import torch
-model_id = "Utkarsh524/codellama_utests_full_new_ver10"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
-prompt = f"""<|system|>
-Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
-- Testing all functions and edge cases
-- Avoiding redundant headers
-- Covering boundary conditions and error scenarios
-- Using clear test names without repetitions
-Generate ONLY test logic without framework-specific macros.
-<|user|>
-Generate unit tests for:
-int add(int a, int b) {{ return a + b; }}
-<|assistant|>
-"""
-inputs = tokenizer(
-    prompt,
-    return_tensors="pt",
-    padding=True,
-    truncation=True,
-    max_length=8192
-).to("cuda")
-outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, top_p=0.9)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("<|assistant|>")[-1].strip())
-```
-license: apache-2.0
-language: c++
-tags:
-- code-generation
-- codellama
-- peft
-- unit-tests
-- causal-lm
-- text-generation
-- embedded-systems
-base_model: codellama/CodeLlama-7b-hf
-model_type: llama
-pipeline_tag: text-generation
----
-# 🚀 CodeLlama Embedded Test Generator (v10)
-This model generates **production-grade unit tests for embedded C/C++ code**. It's a merged adapter of CodeLlama-7B fine-tuned with:
 - 8-bit quantization
 - Flash Attention 2
 - Linear RoPE scaling (factor=2.0)
 - Custom instruction tuning on embedded unit tests
-## 🧠 Key Features
 - Generates framework-agnostic test cases
 - Optimized for embedded systems constraints
 - Strict output formatting (no boilerplate)
 - Special tokens for structured prompting
 - 8192 context window support
-## ⚙️ Technical Specifications
 | **Component**          | **Configuration**                          |
 |-------------------------|-------------------------------------------|
 | **Base Model**         | CodeLlama-7B-HF                          |
@@ -122,15 +47,12 @@ This model generates **production-grade unit tests for embedded C/C++ code**. It
 | **Optimizer**          | Paged AdamW 8-bit                        |
 ## 🧪 Prompt Structure
 <|system|>
 Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
 Testing all functions and edge cases
 Avoiding redundant headers
 Covering boundary conditions and error scenarios
 Using clear test names without repetitions
 Generate ONLY test logic without framework-specific macros.
@@ -191,7 +113,7 @@ print(generate_tests("int add(int a, int b) { return a + b; }"))
 ```
-## 📊 Training Details
 ### Dataset
 - **Source**: `athrv/Embedded_Unittest2`
 - **Processing**:
@@ -201,14 +123,12 @@ print(generate_tests("int add(int a, int b) { return a + b; }"))
 ### LoRA Configuration
 LoraConfig(
-r=64,
-lora_alpha=32,
-target_modules=[
-"q_proj", "v_proj", "k_proj", "o_proj",
-"gate_proj", "up_proj", "down_proj" # All linear layers
-],
-lora_dropout=0.1,
-task_type="CAUSAL_LM"
 )
@@ -226,7 +146,7 @@ base_model.resize_token_embeddings(len(tokenizer))
-## 💡 Optimization Tips
 1. **Hardware**: Use GPUs with >24GB VRAM (A10/A100 recommended)
 2. **Inference**:
    - Temperature: 0.2-0.4

 model_type: llama
 pipeline_tag: text-generation
 ---
+#  CodeLlama Embedded Test Generator (v10)
+This model generates **production-grade unit tests for embedded C/C++ code**.
+It's a merged adapter of CodeLlama-7B fine-tuned with:
 - 8-bit quantization
 - Flash Attention 2
 - Linear RoPE scaling (factor=2.0)
 - Custom instruction tuning on embedded unit tests
+##  Key Features
 - Generates framework-agnostic test cases
 - Optimized for embedded systems constraints
 - Strict output formatting (no boilerplate)
 - Special tokens for structured prompting
 - 8192 context window support
+##  Technical Specifications
 | **Component**          | **Configuration**                          |
 |-------------------------|-------------------------------------------|
 | **Base Model**         | CodeLlama-7B-HF                          |
 | **Optimizer**          | Paged AdamW 8-bit                        |
 ## 🧪 Prompt Structure
 <|system|>
 Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
 Testing all functions and edge cases
 Avoiding redundant headers
 Covering boundary conditions and error scenarios
 Using clear test names without repetitions
 Generate ONLY test logic without framework-specific macros.
 ```
+##  Training Details
 ### Dataset
 - **Source**: `athrv/Embedded_Unittest2`
 - **Processing**:
 ### LoRA Configuration
 LoraConfig(
+-r=64,
+-lora_alpha=32,
+-target_modules=[
+-"q_proj", "v_proj", "k_proj", "o_proj","gate_proj", "up_proj", "down_proj" # All linear layers],
+-lora_dropout=0.1,
+-task_type="CAUSAL_LM"
 )
+##  Optimization Tips
 1. **Hardware**: Use GPUs with >24GB VRAM (A10/A100 recommended)
 2. **Inference**:
    - Temperature: 0.2-0.4