Spaces:

ArshVerma
/

CodeLens

Sleeping

App Files Files Community

CodeLens / kaggle_Details.md

ArshVerma

Fix nested markdown blocks for Kaggle parser

be2f706 18 days ago

preview code

Raw

History Blame Contribute Delete

5.25 kB

	# Model Summary

	CodeLens-Reviewer is an AI agent fine-tuned to act as a Senior Code Reviewer. Built upon the `Qwen2.5-Coder (7B)` architecture, this model has been instruction-tuned by Arsh Verma to autonomously detect bugs, identify security vulnerabilities, and critique architectural flaws in Python Pull Requests. It interacts directly with the CodeLens Evaluation Environment, outputting perfectly structured JSON actions to simulate an automated code review loop.

	## Usage

	This model is intended to be used either standalone for code inference or plugged into the live CodeLens Evaluation Environment.

	### Inference with Unsloth
	```python
	from unsloth import FastLanguageModel

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "ArshVerma/CodeLens-Reviewer",
	max_seq_length = 2048,
	load_in_4bit = True,
	)
	FastLanguageModel.for_inference(model)

	prompt = """You are an expert code reviewer. Review the following code and output a JSON action indicating any bugs or issues you find.

	Code:
	def process(data):
	for i in data:
	if i == "remove": data.remove(i)
	return data
	"""

	messages = [
	{"role": "system", "content": "You are an expert code reviewer. Output only valid JSON."},
	{"role": "user", "content": prompt}
	]

	inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
	outputs = model.generate(input_ids=inputs, max_new_tokens=256)
	print(tokenizer.batch_decode(outputs)[0])
	```

	Shape of Inputs/Outputs:
	- Input: Natural language instructions and a python code snippet.
	- Output: A strict JSON object containing `action`, `issue_description`, `filename`, `line_number`, `severity`, and `category`.

	Known Failures: The model may hallucinate specific line numbers if the provided code diff is extremely long or poorly formatted.

	## System

	This model serves as the core "Agent" in the CodeLens Evaluation System. It receives system prompts, noise budgets, and task definitions from the CodeLens Python backend. Its downstream dependency is the CodeLens WebSocket dashboard (hosted on Hugging Face Spaces), which parses the JSON outputs to assign rewards (positive/negative) and update the live leaderboard.

	## Implementation requirements

	- Hardware: Trained on a single NVIDIA Tesla T4 x2 GPU via Kaggle.
	- Software: Unsloth, PyTorch, Hugging Face Transformers, TRL.
	- Compute: Fine-tuning took less than 15 minutes due to Unsloth's highly optimized 4-bit LoRA training pipeline. Inference requires < 6GB of VRAM, making it incredibly lightweight and accessible.

	---

	# Model Characteristics

	## Model initialization

	The model was fine-tuned from the pre-trained `unsloth/Qwen2.5-Coder-7B-Instruct` base model.

	## Model stats

	- Size: 7 Billion Parameters.
	- Weights: LoRA adapters applied to Q, K, V, O, gate, up, and down projections.
	- Latency: Highly optimized for real-time inference (2x faster than native Hugging Face using Unsloth).

	## Other details

	- Quantization: The model was trained and quantized in 4-bit (bitsandbytes) to severely reduce VRAM requirements.
	- Gradient Checkpointing: Unsloth's exact gradient checkpointing was used to save memory.

	---

	# Data Overview

	The model was trained on a highly specific, synthetically generated instructional dataset targeting code review tasks.

	## Training data

	The training data (`ArshVerma/CodeLens-Dataset`) contains custom prompt-completion pairs. Each row simulates a CodeLens environment state containing:
	- A `pr_title` and `pr_description`.
	- A code `diff` containing planted bugs, security flaws, or architectural issues.
	- A golden `JSON` completion representing the ideal code review action.

	## Demographic groups

	N/A. This model processes programming logic and code structure.

	## Evaluation data

	The model was evaluated iteratively against the live CodeLens engine by observing its capability to identify 3 distinct tasks: `bug_detection`, `security_audit`, and `architectural_review`.

	---

	# Evaluation Results

	## Summary

	The model successfully learned the JSON formatting constraints and correctly identifies logic bugs (e.g., modifying arrays during iteration) without hallucinating markdown wrappers.

	## Subgroup evaluation results

	- Bug Detection: High accuracy. Successfully identifies off-by-one and logical loop errors.
	- Security Audit: Capable of identifying basic input sanitization failures.
	- Architecture: Demonstrates baseline capability in flagging Single Responsibility Principle (SRP) violations.

	## Fairness

	N/A. Evaluation was strictly based on binary reward functions (detecting the objective bug = +1 reward).

	## Usage limitations

	This model is intended strictly for evaluating code within isolated sandboxes or assisting developers. It should not be used to automatically reject or approve production Pull Requests without human oversight, as false positives are possible.

	## Ethics

	No sensitive or PII data was used during the training of this model. Risks of adversarial exploitation exist if malicious code is embedded to trigger arbitrary outputs, but the model outputs strictly to JSON, minimizing execution risks.