Instructions to use md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2")
model = AutoModelForCausalLM.from_pretrained("md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2

SGLang

How to use md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 with Docker Model Runner:
```
docker model run hf.co/md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2
```

Model Card for `md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2`

Model Details

Field	Value
Developed by	Md Ayan (`mdayan8`)
Model type	Causal LM fine-tuning workflow for SQL debugging/repair
Language	English (SQL + natural language prompts)
License	Apache-2.0
Shared by	`md896`
Pipeline tag	Text Generation
Model family tags	`qwen2`, `trl`, `grpo`, `conversational`, `text-generation-inference`

Model Description

This model is part of an execution-grounded SQL debugging workflow built on OpenEnv tasks. The key idea is to optimize for runtime correctness rather than only text-level plausibility.

The training/evaluation workflow uses:

A fast bridge phase on Qwen2.5-Coder-0.5B-Instruct for environment wiring checks.
Baseline/eval track with Qwen2.5-Coder-7B-Instruct and benchmark comparisons.
GRPO-based optimization signals from SQL execution outcomes, grader feedback, and task completion behavior.

Model Sources

Repository: https://github.com/mdayan8/sql-debug-env
Demo / Environment: https://md896-sql-debug-env.hf.space
Training dashboard (W&B): https://wandb.ai/mdayanbag-pesitm/sql-debug-grpo-best-budget/workspace?nw=nwusermdayanbag
Reference arXiv listed for metadata context: https://arxiv.org/abs/1910.09700

Intended Uses

Direct Use

SQL repair assistant style prompting in controlled environments
Runtime-evaluated SQL correction experiments
Benchmark comparison against deterministic SQL debugging tasks

Downstream Use

Fine-tuning initialization for enterprise SQL repair use cases
Evaluation baseline for OpenEnv-style SQL agents

Out-of-Scope / Not Recommended

Autonomous execution against production databases without guardrails
High-risk environments requiring strict SQL governance without additional review controls

Training Details

Training Data

Training signals are generated from deterministic OpenEnv SQL debugging tasks using reset/step interaction loops and execution-based grading.

Training Procedure

Step	Description
Session isolation	Every episode runs in isolated in-memory SQLite state
Task iteration	Query proposals are evaluated task-by-task under deterministic graders
GRPO objective	Relative ranking over generated candidates using execution-grounded reward
Artifact capture	Run metrics, reward traces, and charts are persisted and published

Key Training Hyperparameters (workflow-level)

Hyperparameter area	Value / behavior
GRPO generations	Configured `>= 2` (runtime-safe default in launcher)
Reward composition	Correctness + efficiency + progress + schema bonus - penalties
Sampling controls	Temperature / top-p / completion length controlled in training scripts

For script-level specifics, see:

ultimate_sota_training.py
launch_job.py

Evaluation

Metrics Snapshot

Metric	Value
Spider-style industry baseline	48.2%
Qwen-7B base	52.4%
RL agent headline	78.5%
Performance leap view	0.0% -> 25.0%
Eval artifact pass	32-run

Benchmark Visuals

Training / Proof Visuals

Evidence Artifacts

Sample rewards run folder: https://huggingface.co/spaces/md896/sql-debug-env/tree/main/artifacts/runs/20260426-064318-sample-rewards-32eval
Earlier 32-eval pass folder: https://huggingface.co/spaces/md896/sql-debug-env/tree/main/artifacts/runs/20260426-060502-final-pass-32eval

Bias, Risks, and Limitations

SQL correctness can still degrade under unseen schemas/dialects.
Benchmark-style gains do not guarantee equivalent production reliability.
Model outputs should be reviewed before executing in sensitive environments.

Recommendations

Keep SQL execution sandboxed during evaluation.
Use schema introspection + error inspection loops.
Add reviewer/guardrail checks for risky query classes.
Track run artifacts and compare against deterministic graders, not only manual inspection.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Fix this SQL query based on schema and error context: SELECT * FROM userss;"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Environmental Impact

This model was trained/evaluated across iterative cloud/local workflows. Exact carbon accounting is not yet logged in this card.

Citation

If you use this work, cite the project repository and model page:

Contact

GitHub: https://github.com/mdayan8

Downloads last month: 753

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B-Instruct

Finetuned

(91)

this model

Space using md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 1

Paper for md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 47

Evaluation results

Spider-style headline on SQL Debug Environment task suite
self-reported

78.500

Model Card for md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2

Model Details

Model Description

Model Sources

Intended Uses

Direct Use

Downstream Use

Out-of-Scope / Not Recommended

Training Details

Training Data

Training Procedure

Key Training Hyperparameters (workflow-level)

Evaluation

Metrics Snapshot

Benchmark Visuals

Training / Proof Visuals

Evidence Artifacts

Bias, Risks, and Limitations

Recommendations

How to Get Started

Environmental Impact

Citation

Contact

Model tree for md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2

Space using md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 1

Paper for md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2

Evaluation results

Model Card for `md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2`