Instructions to use shivanandh033/wedding-planner-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use shivanandh033/wedding-planner-7b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shivanandh033/wedding-planner-7b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shivanandh033/wedding-planner-7b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for shivanandh033/wedding-planner-7b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="shivanandh033/wedding-planner-7b", max_seq_length=2048, )
π Indian Wedding Planner RL Agent
A Qwen2.5-7B model fine-tuned with GRPO Reinforcement Learning to autonomously plan complete 3-day Indian weddings inside a stateful, logically-constrained simulation built on the OpenEnv framework.
| ποΈ Live Environment | shivanandh033/wedding-planner-env |
| π Training Notebook | train_colab.ipynb |
| π Mean Episode Reward | Improved from ~0.21 β ~0.44 (+110% over 300 GRPO steps) |
| βοΈ Training Stack | Unsloth + HuggingFace TRL (GRPO), single L4 GPU, ~2.5 hrs |
π― Model Summary
This model was trained using Group Relative Policy Optimization (GRPO) on the custom WeddingPlannerEnv β an OpenEnv-compliant simulator of a 3-day Indian wedding.
The agent must plan the wedding by:
- Booking vendors (caterers, photographers, decorators, priests, DJs) within a strict dynamic budget
- Scheduling events inside auspicious Muhurat windows from the Hindu calendar while avoiding Rahu Kaal
- Detecting and resolving logistical conflicts (double-bookings, ritual ordering violations, missing catering)
The model outputs structured JSON actions and learns from environment rewards through multi-step sequential interaction.
ποΈ Training Details
| Parameter | Value |
|---|---|
| Base Model | unsloth/Qwen2.5-7B-Instruct |
| LoRA rank / alpha | 16 / 32 |
| Quantization | 4-bit (Unsloth) |
| Rollouts per prompt | 4 (num_generations=4) |
| Training epochs | 3 |
| Total steps | 300 |
| Learning rate | 5e-6 |
| GPU | NVIDIA L4 (24 GB VRAM) |
| Training time | ~2.5 hours |
Curriculum Stages
| Stage | Guests | Budget Multiplier | Goal |
|---|---|---|---|
| Easy | 100β150 | 2.0Γ | Learn JSON schema + basic booking logic |
| Medium | 200β300 | 1.3Γ | Learn budget discipline under pressure |
| Hard | 350β500 | 1.05Γ | Master tight constraints at scale |
π Reward Function
The environment computes a weighted multi-objective terminal reward:
WEIGHTS = {
"coverage": 0.35, # % of required vendor categories booked (5 event types)
"budget": 0.25, # Budget efficiency β penalizes deficit and extreme under-spend
"muhurat": 0.20, # Ceremony timing compliance β Rahu Kaal violation = 0.0
"conflicts": 0.10, # Zero active conflicts at finalization
"guest_ux": 0.10, # Event diversity β guests get a complete experience
}
π How to Use
Connect to the Live Environment
import requests
# Initialize a new episode
obs = requests.post(
"https://shivanandh033-wedding-planner-env.hf.space/reset",
json={"seed": 42, "difficulty": "medium"}
).json()["observation"]
print(obs)
# {
# "city": "Delhi", "guest_count": 237, "budget_remaining": 1079325,
# "muhurat_windows": {"pheras": {"start": "08:30", "end": "11:00"}, ...},
# "booked_events": [], "active_conflicts": [], "step": 0
# }
Run the Agent
from unsloth import FastLanguageModel
import json, re, requests
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="shivanandh033/wedding-planner-7b",
max_seq_length=2048,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
SYSTEM = """You are an expert Indian wedding planner.
Plan a 3-day wedding by issuing ONE JSON action per turn.
Valid actions: book_vendor, resolve_conflict, negotiate, finalize_plan.
Always output valid JSON only."""
ENV_URL = "https://shivanandh033-wedding-planner-env.hf.space"
obs = requests.post(f"{ENV_URL}/reset", json={"seed": 42, "difficulty": "medium"}).json()["observation"]
for step in range(20):
prompt = f"{SYSTEM}\n\nCurrent state:\n{json.dumps(obs, indent=2)}\n\nYour action:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
response = tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt):]
match = re.search(r'\{.*?\}', response, re.DOTALL)
action = json.loads(match.group()) if match else {"type": "finalize_plan"}
result = requests.post(f"{ENV_URL}/step", json=action).json()
obs = result["observation"]
print(f"Step {step+1:2d}: {action['type']:20s} | reward: {result['reward']:.4f}")
if result["done"]:
print(f"\nβ
Final Score: {result['reward']:.4f}")
break
πͺοΈ The Constraint Engine
The WeddingPlannerEnv sweeps the full itinerary for logical impossibilities after every single step:
- Ritual Ordering Violations β Pheras booked before Haldi β immediate conflict flag
- Temporal Double Booking β same vendor at same time slot on same date
- Missing Infrastructure β 400-guest reception with no catering vendor attached
- Rahu Kaal Scheduling β any event booked 15:00β16:30 β
muhuratscore collapses to 0.0
Active conflicts surface in active_conflicts of the very next observation. The agent must self-correct β conflicts are never auto-resolved.
π Evaluation Results
Mean episode reward over 300 GRPO training steps. Started at ~0.21, converged to ~0.44 (+110%)
Tested across 200 unique episodes on Easy difficulty over 3 training epochs:
| Metric | Untrained Baseline | GRPO-Trained Agent |
|---|---|---|
| JSON output format | Frequent hallucinations | 100% strict JSON compliance |
| Muhurat scheduling | Events in Rahu Kaal window | Correctly targets 08:30β11:00 |
| Budget management | ~60% wasted or overdrawn | Efficient allocation within limits |
| Active conflicts at finalization | 2β4 per episode | 0β1 per episode |
| Mean episode reward | ~0.21 | ~0.44 (+110%) |
π Citation
@misc{weddingplannerenv2026,
title = {WeddingPlannerEnv: GRPO Reinforcement Learning for Culturally-Constrained Event Planning},
year = {2026},
note = {AR'26 Meta OpenEnv Hackathon Submission},
url = {https://huggingface.co/spaces/shivanandh033/wedding-planner-env}
}
AR'26 Meta OpenEnv Hackathon | Theme #3.1 β World Modeling: Professional Tasks