--- license: apache-2.0 language: - en base_model: - likithyadavv/codementor-7b pipeline_tag: text-generation library_name: transformers tags: - code - code-generation - code-explanation - bug-detection - lora - peft - 4bit - qlora - fullstack - python - javascript - fastapi - codementor metrics: - accuracy --- # šŸ¤– CodeMentor V2 — Fullstack AI Code Assistant > **Code Smarter. Debug Faster. Learn Better.** CodeMentor V2 is a LoRA fine-tuned large language model specialized in **fullstack code explanation, bug detection, and improvement suggestions**. Built on top of CodeLlama-7B-Instruct, it is optimized for real-time developer assistance via a REST API. --- ## šŸ“‹ Model Details | Property | Value | |---|---| | **Model Type** | Causal Language Model (LoRA Adapter) | | **Base Model** | `codellama/CodeLlama-7b-Instruct-hf` | | **Fine-Tuning Method** | QLoRA (4-bit quantization + LoRA) | | **LoRA Rank** | 16 | | **Training Framework** | HuggingFace PEFT + TRL | | **Language** | English | | **License** | Apache 2.0 | | **Adapter Size** | ~162 MB | --- ## šŸŽÆ Intended Use CodeMentor V2 is designed for: - **Code Explanation** — Understand what a block of code does in plain English - **Bug Detection** — Identify logic errors, missing base cases, off-by-ones, etc. - **Code Improvement** — Suggest better patterns, optimizations, and best practices - **Fullstack Q&A** — Answer programming questions across Python, JavaScript, and more - **Developer Mentorship** — Act as an always-available senior developer --- ## šŸš€ Quick Start ### Load with PEFT (Recommended) ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import torch # 4-bit quantization config bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16 ) BASE_MODEL = "codellama/CodeLlama-7b-Instruct-hf" ADAPTER = "likithyadavv/codementor-v2-fullstack" # Load base model base_model = AutoModelForCausalLM.from_pretrained( BASE_MODEL, quantization_config=bnb, device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(base_model, ADAPTER) tokenizer = AutoTokenizer.from_pretrained(ADAPTER) print("āœ… CodeMentor loaded!") ``` ### Run Inference ```python def ask_codementor(instruction, code_input="", max_new_tokens=512): prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{code_input}\n\n### Response:\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, temperature=0.2, do_sample=True, pad_token_id=tokenizer.eos_token_id, ) response = tokenizer.decode( outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True ) return response.strip() # Example usage print(ask_codementor( instruction="Explain this code and identify any bugs.", code_input="def factorial(n): return n * factorial(n-1)" )) ``` **Output:** ``` This is a recursive factorial function. However, it has a critical bug — there is no base case, so it will recurse infinitely and raise a RecursionError. Fix: def factorial(n): if n == 0: # ← base case added return 1 return n * factorial(n - 1) ``` --- ## šŸ’¬ Interactive Chat Loop ```python chat_history = [] while True: user_input = input("\nšŸ‘¤ You: ").strip() if user_input.lower() in ["exit", "quit"]: break # Build context from last 3 exchanges context = "" for u, b in chat_history[-3:]: context += f"User: {u}\nAssistant: {b}\n\n" is_code = any(x in user_input for x in ["def ", "class ", "import ", "return ", "=>"]) instruction = ( "Explain this code, identify any bugs, and suggest improvements." if is_code else "Answer this programming question clearly and concisely." ) full_input = f"{context}User: {user_input}" if context else user_input response = ask_codementor(instruction, full_input) print(f"\nšŸ¤– CodeMentor: {response}") chat_history.append((user_input, response)) ``` --- ## 🌐 Deploy as REST API (FastAPI + ngrok) ```python from fastapi import FastAPI from pydantic import BaseModel import uvicorn, nest_asyncio, threading from pyngrok import ngrok app = FastAPI(title="CodeMentor API") class AskRequest(BaseModel): instruction: str input: str = "" @app.get("/") def root(): return {"status": "CodeMentor API is live šŸš€"} @app.get("/health") def health(): return {"status": "ok"} @app.post("/ask") def ask(req: AskRequest): response = ask_codementor(req.instruction, req.input) return {"response": response} # Launch nest_asyncio.apply() public_url = ngrok.connect(8000) print(f"šŸš€ Live at: {public_url}/docs") threading.Thread( target=lambda: uvicorn.run(app, host="0.0.0.0", port=8000, log_level="warning"), daemon=True ).start() ``` **Example curl:** ```bash curl -X POST https://YOUR-NGROK-URL/ask \ -H "Content-Type: application/json" \ -d '{"instruction": "Explain and fix this code", "input": "def f(n): return n*f(n-1)"}' ``` --- ## šŸ“Š Evaluation | Metric | Score | |---|---| | Code Explanation Accuracy | **92.6%** | | Bug Detection Rate | **89.3%** | | Improvement Suggestion Quality | **4.1 / 5.0** | | Avg. Response Latency (T4 GPU) | **~3.2s** | > Evaluated on a held-out set of 500 fullstack coding tasks across Python, JavaScript, and SQL. --- ## šŸ—‚ļø Training Details ``` Dataset: Custom fullstack coding instruction dataset (code explanations, bug fixes, Q&A pairs) Format: Alpaca-style (### Instruction / ### Input / ### Response) Base Model: codellama/CodeLlama-7b-Instruct-hf Method: QLoRA — 4-bit NF4 quantization + LoRA adapters LoRA Config: r=16, alpha=32, dropout=0.05 target_modules: q_proj, v_proj, k_proj, o_proj Epochs: 3 Batch Size: 4 (gradient accumulation: 4) Learning Rate: 2e-4 with cosine scheduler Hardware: Google Colab A100 (40GB) Training Time: ~4 hours ``` --- ## āš™ļø Hardware Requirements | Setup | Minimum | Recommended | |---|---|---| | GPU VRAM | 8 GB (4-bit) | 16 GB+ | | RAM | 12 GB | 24 GB | | GPU | T4 | A100 / RTX 3090+ | | Storage | 15 GB | 20 GB | > āœ… Runs on **free Google Colab T4** with 4-bit quantization. --- ## āš ļø Limitations - Responses may occasionally hallucinate for very niche or obscure APIs - Best results on Python and JavaScript; other languages have lower coverage - Long code blocks (>200 lines) may exceed context window — chunk inputs - Not suitable for security-critical code auditing without human review --- ## šŸ“š Citation ```bibtex @misc{codementor-v2-fullstack, author = {Likith Yadav}, title = {CodeMentor V2: A LoRA Fine-Tuned Fullstack Code Assistant}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/likithyadavv/codementor-v2-fullstack}}, } ``` --- ## šŸ”— Links - šŸ¤— **Model Repo:** [likithyadavv/codementor-v2-fullstack](https://huggingface.co/likithyadavv/codementor-v2-fullstack) - šŸ“– **Base Model:** [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) - šŸ« **Institution:** MVJ College of Engineering, Bengaluru, India