| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - likithyadavv/codementor-7b |
| pipeline_tag: text-generation |
| library_name: transformers |
| tags: |
| - code |
| - code-generation |
| - code-explanation |
| - bug-detection |
| - lora |
| - peft |
| - 4bit |
| - qlora |
| - fullstack |
| - python |
| - javascript |
| - fastapi |
| - codementor |
| metrics: |
| - accuracy |
| --- |
| |
| # π€ CodeMentor V2 β Fullstack AI Code Assistant |
|
|
| > **Code Smarter. Debug Faster. Learn Better.** |
|
|
| CodeMentor V2 is a LoRA fine-tuned large language model specialized in **fullstack code explanation, bug detection, and improvement suggestions**. Built on top of CodeLlama-7B-Instruct, it is optimized for real-time developer assistance via a REST API. |
|
|
| --- |
|
|
| ## π Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | **Model Type** | Causal Language Model (LoRA Adapter) | |
| | **Base Model** | `codellama/CodeLlama-7b-Instruct-hf` | |
| | **Fine-Tuning Method** | QLoRA (4-bit quantization + LoRA) | |
| | **LoRA Rank** | 16 | |
| | **Training Framework** | HuggingFace PEFT + TRL | |
| | **Language** | English | |
| | **License** | Apache 2.0 | |
| | **Adapter Size** | ~162 MB | |
|
|
| --- |
|
|
| ## π― Intended Use |
|
|
| CodeMentor V2 is designed for: |
|
|
| - **Code Explanation** β Understand what a block of code does in plain English |
| - **Bug Detection** β Identify logic errors, missing base cases, off-by-ones, etc. |
| - **Code Improvement** β Suggest better patterns, optimizations, and best practices |
| - **Fullstack Q&A** β Answer programming questions across Python, JavaScript, and more |
| - **Developer Mentorship** β Act as an always-available senior developer |
|
|
| --- |
|
|
| ## π Quick Start |
|
|
| ### Load with PEFT (Recommended) |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| from peft import PeftModel |
| import torch |
| |
| # 4-bit quantization config |
| bnb = BitsAndBytesConfig( |
| load_in_4bit=True, |
| bnb_4bit_compute_dtype=torch.float16 |
| ) |
| |
| BASE_MODEL = "codellama/CodeLlama-7b-Instruct-hf" |
| ADAPTER = "likithyadavv/codementor-v2-fullstack" |
| |
| # Load base model |
| base_model = AutoModelForCausalLM.from_pretrained( |
| BASE_MODEL, |
| quantization_config=bnb, |
| device_map="auto" |
| ) |
| |
| # Load LoRA adapter |
| model = PeftModel.from_pretrained(base_model, ADAPTER) |
| tokenizer = AutoTokenizer.from_pretrained(ADAPTER) |
| |
| print("β
CodeMentor loaded!") |
| ``` |
|
|
| ### Run Inference |
|
|
| ```python |
| def ask_codementor(instruction, code_input="", max_new_tokens=512): |
| prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{code_input}\n\n### Response:\n" |
| |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=max_new_tokens, |
| temperature=0.2, |
| do_sample=True, |
| pad_token_id=tokenizer.eos_token_id, |
| ) |
| response = tokenizer.decode( |
| outputs[0][inputs["input_ids"].shape[1]:], |
| skip_special_tokens=True |
| ) |
| return response.strip() |
| |
| # Example usage |
| print(ask_codementor( |
| instruction="Explain this code and identify any bugs.", |
| code_input="def factorial(n): return n * factorial(n-1)" |
| )) |
| ``` |
|
|
| **Output:** |
| ``` |
| This is a recursive factorial function. However, it has a critical bug β |
| there is no base case, so it will recurse infinitely and raise a |
| RecursionError. Fix: |
| |
| def factorial(n): |
| if n == 0: # β base case added |
| return 1 |
| return n * factorial(n - 1) |
| ``` |
|
|
| --- |
|
|
| ## π¬ Interactive Chat Loop |
|
|
| ```python |
| chat_history = [] |
| |
| while True: |
| user_input = input("\nπ€ You: ").strip() |
| if user_input.lower() in ["exit", "quit"]: |
| break |
| |
| # Build context from last 3 exchanges |
| context = "" |
| for u, b in chat_history[-3:]: |
| context += f"User: {u}\nAssistant: {b}\n\n" |
| |
| is_code = any(x in user_input for x in ["def ", "class ", "import ", "return ", "=>"]) |
| instruction = ( |
| "Explain this code, identify any bugs, and suggest improvements." |
| if is_code else |
| "Answer this programming question clearly and concisely." |
| ) |
| |
| full_input = f"{context}User: {user_input}" if context else user_input |
| response = ask_codementor(instruction, full_input) |
| |
| print(f"\nπ€ CodeMentor: {response}") |
| chat_history.append((user_input, response)) |
| ``` |
|
|
| --- |
|
|
| ## π Deploy as REST API (FastAPI + ngrok) |
|
|
| ```python |
| from fastapi import FastAPI |
| from pydantic import BaseModel |
| import uvicorn, nest_asyncio, threading |
| from pyngrok import ngrok |
| |
| app = FastAPI(title="CodeMentor API") |
| |
| class AskRequest(BaseModel): |
| instruction: str |
| input: str = "" |
| |
| @app.get("/") |
| def root(): |
| return {"status": "CodeMentor API is live π"} |
| |
| @app.get("/health") |
| def health(): |
| return {"status": "ok"} |
| |
| @app.post("/ask") |
| def ask(req: AskRequest): |
| response = ask_codementor(req.instruction, req.input) |
| return {"response": response} |
| |
| # Launch |
| nest_asyncio.apply() |
| public_url = ngrok.connect(8000) |
| print(f"π Live at: {public_url}/docs") |
| |
| threading.Thread( |
| target=lambda: uvicorn.run(app, host="0.0.0.0", port=8000, log_level="warning"), |
| daemon=True |
| ).start() |
| ``` |
|
|
| **Example curl:** |
| ```bash |
| curl -X POST https://YOUR-NGROK-URL/ask \ |
| -H "Content-Type: application/json" \ |
| -d '{"instruction": "Explain and fix this code", "input": "def f(n): return n*f(n-1)"}' |
| ``` |
|
|
| --- |
|
|
| ## π Evaluation |
|
|
| | Metric | Score | |
| |---|---| |
| | Code Explanation Accuracy | **92.6%** | |
| | Bug Detection Rate | **89.3%** | |
| | Improvement Suggestion Quality | **4.1 / 5.0** | |
| | Avg. Response Latency (T4 GPU) | **~3.2s** | |
|
|
| > Evaluated on a held-out set of 500 fullstack coding tasks across Python, JavaScript, and SQL. |
|
|
| --- |
|
|
| ## ποΈ Training Details |
|
|
| ``` |
| Dataset: Custom fullstack coding instruction dataset |
| (code explanations, bug fixes, Q&A pairs) |
| Format: Alpaca-style (### Instruction / ### Input / ### Response) |
| Base Model: codellama/CodeLlama-7b-Instruct-hf |
| Method: QLoRA β 4-bit NF4 quantization + LoRA adapters |
| LoRA Config: r=16, alpha=32, dropout=0.05 |
| target_modules: q_proj, v_proj, k_proj, o_proj |
| Epochs: 3 |
| Batch Size: 4 (gradient accumulation: 4) |
| Learning Rate: 2e-4 with cosine scheduler |
| Hardware: Google Colab A100 (40GB) |
| Training Time: ~4 hours |
| ``` |
|
|
| --- |
|
|
| ## βοΈ Hardware Requirements |
|
|
| | Setup | Minimum | Recommended | |
| |---|---|---| |
| | GPU VRAM | 8 GB (4-bit) | 16 GB+ | |
| | RAM | 12 GB | 24 GB | |
| | GPU | T4 | A100 / RTX 3090+ | |
| | Storage | 15 GB | 20 GB | |
|
|
| > β
Runs on **free Google Colab T4** with 4-bit quantization. |
|
|
| --- |
|
|
| ## β οΈ Limitations |
|
|
| - Responses may occasionally hallucinate for very niche or obscure APIs |
| - Best results on Python and JavaScript; other languages have lower coverage |
| - Long code blocks (>200 lines) may exceed context window β chunk inputs |
| - Not suitable for security-critical code auditing without human review |
|
|
| --- |
|
|
| ## π Citation |
|
|
| ```bibtex |
| @misc{codementor-v2-fullstack, |
| author = {Likith Yadav}, |
| title = {CodeMentor V2: A LoRA Fine-Tuned Fullstack Code Assistant}, |
| year = {2025}, |
| publisher = {HuggingFace}, |
| howpublished = {\url{https://huggingface.co/likithyadavv/codementor-v2-fullstack}}, |
| } |
| ``` |
|
|
| --- |
|
|
| ## π Links |
|
|
| - π€ **Model Repo:** [likithyadavv/codementor-v2-fullstack](https://huggingface.co/likithyadavv/codementor-v2-fullstack) |
| - π **Base Model:** [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
| - π« **Institution:** MVJ College of Engineering, Bengaluru, India |