Model Card for schemaquake-lora

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct.

It was trained with TRL / GRPO on SchemaQuake, an OpenEnv-compatible environment for training LLM agents to handle silent schema and policy drift in professional workflows.

What is SchemaQuake?

SchemaQuake is a travel-booking environment designed to test whether an agent can notice when the world changes underneath it.

The agent receives a task such as:

Book a refundable flight from BLR to DEL under 8000 rupees.

During the episode, the environment may silently change:

a field name, such as price_rupees to ticket_price
a unit, such as rupees to paise
a refundability field, such as boolean to refund tier
a policy document, such as a 24-hour cancellation window to 48 hours

The model is rewarded for completing the user’s real task, detecting drift, inspecting schema or policy when needed, avoiding silent violations, and acting efficiently.

Intended Use

This adapter is intended for the SchemaQuake hackathon environment:

Space: realambuj2001/schemaquake1
Environment/results repo: realambuj2001/schemaquake1
Task type: drift-aware professional agent workflows
Training method: light SFT warm start + TRL GRPO with rollout rewards

It is not intended as a general-purpose travel-booking assistant. The main contribution is the environment and training pipeline for safer agent behavior under changing schemas and policies.

Quick start

This repo contains the trained adapter artifacts. Use it with the SchemaQuake Space or load it as an adapter on top of the base Qwen model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-0.5B-Instruct"
adapter = "realambuj2001/schemaquake1-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

prompt = """You are a SchemaQuake travel-booking agent.
Return JSON with an actions list.
Task: Book a refundable flight from BLR to DEL under 8000 rupees.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 2

Safetensors

Model size

0.5B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for realambuj2001/schemaquake1-lora

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(871)

this model

realambuj2001
/

schemaquake1-lora

Model Card for schemaquake-lora

What is SchemaQuake?

Intended Use

Quick start

Model tree for realambuj2001/schemaquake1-lora

Space using realambuj2001/schemaquake1-lora 1