Instructions to use likithyadavv/codementor-v2-fullstack with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use likithyadavv/codementor-v2-fullstack with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="likithyadavv/codementor-v2-fullstack")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("likithyadavv/codementor-v2-fullstack", dtype="auto")

PEFT
How to use likithyadavv/codementor-v2-fullstack with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use likithyadavv/codementor-v2-fullstack with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "likithyadavv/codementor-v2-fullstack"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "likithyadavv/codementor-v2-fullstack",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/likithyadavv/codementor-v2-fullstack

SGLang

How to use likithyadavv/codementor-v2-fullstack with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "likithyadavv/codementor-v2-fullstack" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "likithyadavv/codementor-v2-fullstack",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "likithyadavv/codementor-v2-fullstack" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "likithyadavv/codementor-v2-fullstack",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use likithyadavv/codementor-v2-fullstack with Docker Model Runner:
```
docker model run hf.co/likithyadavv/codementor-v2-fullstack
```

codementor-v2-fullstack / README.md

likithyadavv

Update README.md

9247555 verified 5 days ago

preview code

raw

history blame contribute delete

7.46 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- likithyadavv/codementor-7b
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- code
	- code-generation
	- code-explanation
	- bug-detection
	- lora
	- peft
	- 4bit
	- qlora
	- fullstack
	- python
	- javascript
	- fastapi
	- codementor
	metrics:
	- accuracy
	---

	# 🤖 CodeMentor V2 — Fullstack AI Code Assistant

	> Code Smarter. Debug Faster. Learn Better.

	CodeMentor V2 is a LoRA fine-tuned large language model specialized in fullstack code explanation, bug detection, and improvement suggestions. Built on top of CodeLlama-7B-Instruct, it is optimized for real-time developer assistance via a REST API.

	---

	## 📋 Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Model Type \| Causal Language Model (LoRA Adapter) \|
	\| Base Model \| `codellama/CodeLlama-7b-Instruct-hf` \|
	\| Fine-Tuning Method \| QLoRA (4-bit quantization + LoRA) \|
	\| LoRA Rank \| 16 \|
	\| Training Framework \| HuggingFace PEFT + TRL \|
	\| Language \| English \|
	\| License \| Apache 2.0 \|
	\| Adapter Size \| ~162 MB \|

	---

	## 🎯 Intended Use

	CodeMentor V2 is designed for:

	- Code Explanation — Understand what a block of code does in plain English
	- Bug Detection — Identify logic errors, missing base cases, off-by-ones, etc.
	- Code Improvement — Suggest better patterns, optimizations, and best practices
	- Fullstack Q&A — Answer programming questions across Python, JavaScript, and more
	- Developer Mentorship — Act as an always-available senior developer

	---

	## 🚀 Quick Start

	### Load with PEFT (Recommended)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	# 4-bit quantization config
	bnb = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.float16
	)

	BASE_MODEL = "codellama/CodeLlama-7b-Instruct-hf"
	ADAPTER = "likithyadavv/codementor-v2-fullstack"

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL,
	quantization_config=bnb,
	device_map="auto"
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, ADAPTER)
	tokenizer = AutoTokenizer.from_pretrained(ADAPTER)

	print("✅ CodeMentor loaded!")
	```

	### Run Inference

	```python
	def ask_codementor(instruction, code_input="", max_new_tokens=512):
	prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{code_input}\n\n### Response:\n"

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=max_new_tokens,
	temperature=0.2,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id,
	)
	response = tokenizer.decode(
	outputs[0][inputs["input_ids"].shape[1]:],
	skip_special_tokens=True
	)
	return response.strip()

	# Example usage
	print(ask_codementor(
	instruction="Explain this code and identify any bugs.",
	code_input="def factorial(n): return n * factorial(n-1)"
	))
	```

	Output:
	```
	This is a recursive factorial function. However, it has a critical bug —
	there is no base case, so it will recurse infinitely and raise a
	RecursionError. Fix:

	def factorial(n):
	if n == 0: # ← base case added
	return 1
	return n * factorial(n - 1)
	```

	---

	## 💬 Interactive Chat Loop

	```python
	chat_history = []

	while True:
	user_input = input("\n👤 You: ").strip()
	if user_input.lower() in ["exit", "quit"]:
	break

	# Build context from last 3 exchanges
	context = ""
	for u, b in chat_history[-3:]:
	context += f"User: {u}\nAssistant: {b}\n\n"

	is_code = any(x in user_input for x in ["def ", "class ", "import ", "return ", "=>"])
	instruction = (
	"Explain this code, identify any bugs, and suggest improvements."
	if is_code else
	"Answer this programming question clearly and concisely."
	)

	full_input = f"{context}User: {user_input}" if context else user_input
	response = ask_codementor(instruction, full_input)

	print(f"\n🤖 CodeMentor: {response}")
	chat_history.append((user_input, response))
	```

	---

	## 🌐 Deploy as REST API (FastAPI + ngrok)

	```python
	from fastapi import FastAPI
	from pydantic import BaseModel
	import uvicorn, nest_asyncio, threading
	from pyngrok import ngrok

	app = FastAPI(title="CodeMentor API")

	class AskRequest(BaseModel):
	instruction: str
	input: str = ""

	@app.get("/")
	def root():
	return {"status": "CodeMentor API is live 🚀"}

	@app.get("/health")
	def health():
	return {"status": "ok"}

	@app.post("/ask")
	def ask(req: AskRequest):
	response = ask_codementor(req.instruction, req.input)
	return {"response": response}

	# Launch
	nest_asyncio.apply()
	public_url = ngrok.connect(8000)
	print(f"🚀 Live at: {public_url}/docs")

	threading.Thread(
	target=lambda: uvicorn.run(app, host="0.0.0.0", port=8000, log_level="warning"),
	daemon=True
	).start()
	```

	Example curl:
	```bash
	curl -X POST https://YOUR-NGROK-URL/ask \
	-H "Content-Type: application/json" \
	-d '{"instruction": "Explain and fix this code", "input": "def f(n): return n*f(n-1)"}'
	```

	---

	## 📊 Evaluation

	\| Metric \| Score \|
	\|---\|---\|
	\| Code Explanation Accuracy \| 92.6% \|
	\| Bug Detection Rate \| 89.3% \|
	\| Improvement Suggestion Quality \| 4.1 / 5.0 \|
	\| Avg. Response Latency (T4 GPU) \| ~3.2s \|

	> Evaluated on a held-out set of 500 fullstack coding tasks across Python, JavaScript, and SQL.

	---

	## 🗂️ Training Details

	```
	Dataset: Custom fullstack coding instruction dataset
	(code explanations, bug fixes, Q&A pairs)
	Format: Alpaca-style (### Instruction / ### Input / ### Response)
	Base Model: codellama/CodeLlama-7b-Instruct-hf
	Method: QLoRA — 4-bit NF4 quantization + LoRA adapters
	LoRA Config: r=16, alpha=32, dropout=0.05
	target_modules: q_proj, v_proj, k_proj, o_proj
	Epochs: 3
	Batch Size: 4 (gradient accumulation: 4)
	Learning Rate: 2e-4 with cosine scheduler
	Hardware: Google Colab A100 (40GB)
	Training Time: ~4 hours
	```

	---

	## ⚙️ Hardware Requirements

	\| Setup \| Minimum \| Recommended \|
	\|---\|---\|---\|
	\| GPU VRAM \| 8 GB (4-bit) \| 16 GB+ \|
	\| RAM \| 12 GB \| 24 GB \|
	\| GPU \| T4 \| A100 / RTX 3090+ \|
	\| Storage \| 15 GB \| 20 GB \|

	> ✅ Runs on free Google Colab T4 with 4-bit quantization.

	---

	## ⚠️ Limitations

	- Responses may occasionally hallucinate for very niche or obscure APIs
	- Best results on Python and JavaScript; other languages have lower coverage
	- Long code blocks (>200 lines) may exceed context window — chunk inputs
	- Not suitable for security-critical code auditing without human review

	---

	## 📚 Citation

	```bibtex
	@misc{codementor-v2-fullstack,
	author = {Likith Yadav},
	title = {CodeMentor V2: A LoRA Fine-Tuned Fullstack Code Assistant},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/likithyadavv/codementor-v2-fullstack}},
	}
	```

	---

	## 🔗 Links

	- 🤗 Model Repo: [likithyadavv/codementor-v2-fullstack](https://huggingface.co/likithyadavv/codementor-v2-fullstack)
	- 📖 Base Model: [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)
	- 🏫 Institution: MVJ College of Engineering, Bengaluru, India