Instructions to use likithyadavv/codementor-v2-fullstack with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use likithyadavv/codementor-v2-fullstack with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="likithyadavv/codementor-v2-fullstack")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("likithyadavv/codementor-v2-fullstack", dtype="auto")

PEFT
How to use likithyadavv/codementor-v2-fullstack with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use likithyadavv/codementor-v2-fullstack with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "likithyadavv/codementor-v2-fullstack"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "likithyadavv/codementor-v2-fullstack",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/likithyadavv/codementor-v2-fullstack

SGLang

How to use likithyadavv/codementor-v2-fullstack with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "likithyadavv/codementor-v2-fullstack" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "likithyadavv/codementor-v2-fullstack",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "likithyadavv/codementor-v2-fullstack" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "likithyadavv/codementor-v2-fullstack",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use likithyadavv/codementor-v2-fullstack with Docker Model Runner:
```
docker model run hf.co/likithyadavv/codementor-v2-fullstack
```

🤖 CodeMentor V2 — Fullstack AI Code Assistant

Code Smarter. Debug Faster. Learn Better.

CodeMentor V2 is a LoRA fine-tuned large language model specialized in fullstack code explanation, bug detection, and improvement suggestions. Built on top of CodeLlama-7B-Instruct, it is optimized for real-time developer assistance via a REST API.

📋 Model Details

Property	Value
Model Type	Causal Language Model (LoRA Adapter)
Base Model	`codellama/CodeLlama-7b-Instruct-hf`
Fine-Tuning Method	QLoRA (4-bit quantization + LoRA)
LoRA Rank	16
Training Framework	HuggingFace PEFT + TRL
Language	English
License	Apache 2.0
Adapter Size	~162 MB

🎯 Intended Use

CodeMentor V2 is designed for:

Code Explanation — Understand what a block of code does in plain English
Bug Detection — Identify logic errors, missing base cases, off-by-ones, etc.
Code Improvement — Suggest better patterns, optimizations, and best practices
Fullstack Q&A — Answer programming questions across Python, JavaScript, and more
Developer Mentorship — Act as an always-available senior developer

🚀 Quick Start

Load with PEFT (Recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# 4-bit quantization config
bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)

BASE_MODEL = "codellama/CodeLlama-7b-Instruct-hf"
ADAPTER    = "likithyadavv/codementor-v2-fullstack"

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb,
    device_map="auto"
)

# Load LoRA adapter
model     = PeftModel.from_pretrained(base_model, ADAPTER)
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)

print("✅ CodeMentor loaded!")

Run Inference

def ask_codementor(instruction, code_input="", max_new_tokens=512):
    prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{code_input}\n\n### Response:\n"
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.2,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
        )
    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )
    return response.strip()

# Example usage
print(ask_codementor(
    instruction="Explain this code and identify any bugs.",
    code_input="def factorial(n): return n * factorial(n-1)"
))

Output:

This is a recursive factorial function. However, it has a critical bug —
there is no base case, so it will recurse infinitely and raise a
RecursionError. Fix:

def factorial(n):
    if n == 0:      # ← base case added
        return 1
    return n * factorial(n - 1)

💬 Interactive Chat Loop

chat_history = []

while True:
    user_input = input("\n👤 You: ").strip()
    if user_input.lower() in ["exit", "quit"]:
        break

    # Build context from last 3 exchanges
    context = ""
    for u, b in chat_history[-3:]:
        context += f"User: {u}\nAssistant: {b}\n\n"

    is_code = any(x in user_input for x in ["def ", "class ", "import ", "return ", "=>"])
    instruction = (
        "Explain this code, identify any bugs, and suggest improvements."
        if is_code else
        "Answer this programming question clearly and concisely."
    )

    full_input = f"{context}User: {user_input}" if context else user_input
    response = ask_codementor(instruction, full_input)

    print(f"\n🤖 CodeMentor: {response}")
    chat_history.append((user_input, response))

🌐 Deploy as REST API (FastAPI + ngrok)

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, nest_asyncio, threading
from pyngrok import ngrok

app = FastAPI(title="CodeMentor API")

class AskRequest(BaseModel):
    instruction: str
    input: str = ""

@app.get("/")
def root():
    return {"status": "CodeMentor API is live 🚀"}

@app.get("/health")
def health():
    return {"status": "ok"}

@app.post("/ask")
def ask(req: AskRequest):
    response = ask_codementor(req.instruction, req.input)
    return {"response": response}

# Launch
nest_asyncio.apply()
public_url = ngrok.connect(8000)
print(f"🚀 Live at: {public_url}/docs")

threading.Thread(
    target=lambda: uvicorn.run(app, host="0.0.0.0", port=8000, log_level="warning"),
    daemon=True
).start()

Example curl:

curl -X POST https://YOUR-NGROK-URL/ask \
  -H "Content-Type: application/json" \
  -d '{"instruction": "Explain and fix this code", "input": "def f(n): return n*f(n-1)"}'

📊 Evaluation

Metric	Score
Code Explanation Accuracy	92.6%
Bug Detection Rate	89.3%
Improvement Suggestion Quality	4.1 / 5.0
Avg. Response Latency (T4 GPU)	~3.2s

Evaluated on a held-out set of 500 fullstack coding tasks across Python, JavaScript, and SQL.

🗂️ Training Details

Dataset:        Custom fullstack coding instruction dataset
                (code explanations, bug fixes, Q&A pairs)
Format:         Alpaca-style (### Instruction / ### Input / ### Response)
Base Model:     codellama/CodeLlama-7b-Instruct-hf
Method:         QLoRA — 4-bit NF4 quantization + LoRA adapters
LoRA Config:    r=16, alpha=32, dropout=0.05
                target_modules: q_proj, v_proj, k_proj, o_proj
Epochs:         3
Batch Size:     4 (gradient accumulation: 4)
Learning Rate:  2e-4 with cosine scheduler
Hardware:       Google Colab A100 (40GB)
Training Time:  ~4 hours

⚙️ Hardware Requirements

Setup	Minimum	Recommended
GPU VRAM	8 GB (4-bit)	16 GB+
RAM	12 GB	24 GB
GPU	T4	A100 / RTX 3090+
Storage	15 GB	20 GB

✅ Runs on free Google Colab T4 with 4-bit quantization.

⚠️ Limitations

Responses may occasionally hallucinate for very niche or obscure APIs
Best results on Python and JavaScript; other languages have lower coverage
Long code blocks (>200 lines) may exceed context window — chunk inputs
Not suitable for security-critical code auditing without human review

📚 Citation

@misc{codementor-v2-fullstack,
  author       = {Likith Yadav},
  title        = {CodeMentor V2: A LoRA Fine-Tuned Fullstack Code Assistant},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/likithyadavv/codementor-v2-fullstack}},
}

🔗 Links

🤗 Model Repo: likithyadavv/codementor-v2-fullstack
📖 Base Model: codellama/CodeLlama-7b-Instruct-hf
🏫 Institution: MVJ College of Engineering, Bengaluru, India

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for likithyadavv/codementor-v2-fullstack

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Finetuned

likithyadavv/codementor-7b

Adapter

(1)

this model