Elinnos
/

codellama-fine-tuning

Model card Files Files and versions

xet

Community

Prithvik-1 commited on Nov 25, 2025

Commit

64ccb77

verified ·

1 Parent(s): 4a11103

Upload EVALUATION_SUMMARY.md with huggingface_hub

Browse files

Files changed (1) hide show

EVALUATION_SUMMARY.md +68 -0

EVALUATION_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# 📊 CodeLlama Evaluation Summary
+**Date:** November 25, 2025
+**Model:** `codellama-fifo-v1`
+---
+## 🎯 Quick Summary
+| Metric | Value |
+|--------|-------|
+| **Training Samples Avg Similarity** | 13.30% |
+| **Test Samples Avg Similarity** | 0.93% |
+| **Overall Similarity** | 7.11% |
+| **Code Generation Rate** | 50% (training only) |
+---
+## ✅ What Worked
+1. **Model Loading:** Successfully loads with LoRA adapters
+2. **Training Samples:** Partial code generation (module declarations)
+3. **Training Sample 2:** 20.70% similarity (best result)
+---
+## ❌ Critical Issues
+1. **Incomplete Code:** Training samples generate only module declarations
+2. **Text Instead of Code:** Test samples generate repetitive text notes
+3. **Repetition:** Severe repetition in test sample outputs
+4. **Early Stopping:** Code generation stops before completion
+---
+## 🔧 Immediate Actions Needed
+1. **Fix Prompt Format**
+   - Match training data format exactly
+   - Test without system prompt prefix
+   - Add explicit code generation instruction
+2. **Adjust Inference Parameters**
+   - Try lower temperature (0.1-0.2)
+   - Increase max_new_tokens
+   - Test different stopping criteria
+3. **Check Training Data**
+   - Verify all samples have complete code
+   - Ensure consistent formatting
+   - Remove any text-only samples
+4. **Re-test with Adjusted Prompts**
+   - Use exact training format
+   - Test simpler prompts
+   - Verify generation doesn't stop early
+---
+## 📈 Detailed Results
+See `EVALUATION_REPORT.md` for complete analysis.
+---
+**Status:** ⚠️ **NEEDS IMPROVEMENT**
+**Next Steps:** Adjust prompts and inference parameters, then re-test.