Spaces:

hf-skills
/

llm-trainer

Running

App Files Files Community

burtenshaw HF Staff commited on 2 days ago

Commit

6ab17a7

verified ·

1 Parent(s): 02b5f0a

Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

SKILL.md +706 -0
references/gguf_conversion.md +296 -0
references/hardware_guide.md +283 -0
references/hub_saving.md +364 -0
references/reliability_principles.md +371 -0
references/trackio_guide.md +189 -0
references/training_methods.md +150 -0
references/training_patterns.md +203 -0
references/troubleshooting.md +282 -0
scripts/convert_to_gguf.py +350 -0
scripts/dataset_inspector.py +416 -0
scripts/estimate_cost.py +149 -0
scripts/train_dpo_example.py +105 -0
scripts/train_grpo_example.py +88 -0
scripts/train_sft_example.py +119 -0

SKILL.md ADDED Viewed

	@@ -0,0 +1,706 @@

+---
+name: model-trainer
+description: This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
+license: Complete terms in LICENSE.txt
+---
+# TRL Training on Hugging Face Jobs
+## Overview
+Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
+**TRL provides multiple training methods:**
+- **SFT** (Supervised Fine-Tuning) - Standard instruction tuning
+- **DPO** (Direct Preference Optimization) - Alignment from preference data
+- **GRPO** (Group Relative Policy Optimization) - Online RL training
+- **Reward Modeling** - Train reward models for RLHF
+**For detailed TRL method documentation:**
+```python
+hf_doc_search("your query", product="trl")
+hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
+hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
+# etc.
+```
+**See also:** `references/training_methods.md` for method overviews and selection guidance
+## When to Use This Skill
+Use this skill when users want to:
+- Fine-tune language models on cloud GPUs without local infrastructure
+- Train with TRL methods (SFT, DPO, GRPO, etc.)
+- Run training jobs on Hugging Face Jobs infrastructure
+- Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
+- Ensure trained models are permanently saved to the Hub
+- Use modern workflows with optimized defaults
+## Key Directives
+When assisting with training jobs:
+1. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs("uv", {...})`, NOT bash `trl-jobs` commands. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`. If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using `hf_jobs()`.
+2. **Always include Trackio** - Every training script should include Trackio for real-time monitoring. Use example scripts in `scripts/` as templates.
+3. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
+4. **Use example scripts as templates** - Reference `scripts/train_sft_example.py`, `scripts/train_dpo_example.py`, etc. as starting points.
+## Local Script Dependencies
+To run scripts locally (like `estimate_cost.py`), install dependencies:
+```bash
+pip install -r requirements.txt
+```
+## Prerequisites Checklist
+Before starting any training job, verify:
+### ✅ **Account & Authentication**
+- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)
+- Authenticated login: Check with `hf_whoami()`
+- **HF_TOKEN for Hub Push** ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
+- Token must have write permissions
+- **MUST pass `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config** to make token available (the `$HF_TOKEN` syntax
+  references your actual token value)
+### ✅ **Dataset Requirements**
+- Dataset must exist on Hub or be loadable via `datasets.load_dataset()`
+- Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
+- **ALWAYS validate unknown datasets** before GPU training to prevent format failures (see Dataset Validation section below)
+- Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)
+### ⚠️ **Critical Settings**
+- **Timeout must exceed expected training time** - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
+- **Hub push must be enabled** - Config: `push_to_hub=True`, `hub_model_id="username/model-name"`; Job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
+## Asynchronous Job Guidelines
+**⚠️ IMPORTANT: Training jobs run asynchronously and can take hours**
+### Action Required
+**When user requests training:**
+1. **Create the training script** with Trackio included (use `scripts/train_sft_example.py` as template)
+2. **Submit immediately** using `hf_jobs()` MCP tool with script content inline - don't save to file unless user requests
+3. **Report submission** with job ID, monitoring URL, and estimated time
+4. **Wait for user** to request status checks - don't poll automatically
+### Ground Rules
+- **Jobs run in background** - Submission returns immediately; training continues independently
+- **Initial logs delayed** - Can take 30-60 seconds for logs to appear
+- **User checks status** - Wait for user to request status updates
+- **Avoid polling** - Check logs only on user request; provide monitoring links instead
+### After Submission
+**Provide to user:**
+- ✅ Job ID and monitoring URL
+- ✅ Expected completion time
+- ✅ Trackio dashboard URL
+- ✅ Note that user can request status checks later
+**Example Response:**
+```
+✅ Job submitted successfully!
+Job ID: abc123xyz
+Monitor: https://huggingface.co/jobs/username/abc123xyz
+Expected time: ~2 hours
+Estimated cost: ~$10
+The job is running in the background. Ask me to check status/logs when ready!
+```
+## Quick Start: Three Approaches
+**💡 Tip for Demos:** For quick demos on smaller GPUs (t4-small), omit `eval_dataset` and `eval_strategy` to save ~40% memory. You'll still see training loss and learning progress.
+### Sequence Length Configuration
+**TRL config classes use `max_length` (not `max_seq_length`)** to control tokenized sequence length:
+```python
+# ✅ CORRECT - If you need to set sequence length
+SFTConfig(max_length=512)   # Truncate sequences to 512 tokens
+DPOConfig(max_length=2048)  # Longer context (2048 tokens)
+# ❌ WRONG - This parameter doesn't exist
+SFTConfig(max_seq_length=512)  # TypeError!
+```
+**Default behavior:** `max_length=1024` (truncates from right). This works well for most training.
+**When to override:**
+- **Longer context**: Set higher (e.g., `max_length=2048`)
+- **Memory constraints**: Set lower (e.g., `max_length=512`)
+- **Vision models**: Set `max_length=None` (prevents cutting image tokens)
+**Usually you don't need to set this parameter at all** - the examples below use the sensible default.
+### Approach 1: UV Scripts (Recommended—Default Choice)
+UV scripts use PEP 723 inline dependencies for clean, self-contained training. **This is the primary approach for Claude Code.**
+```python
+hf_jobs("uv", {
+    "script": """
+# /// script
+# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
+# ///
+from datasets import load_dataset
+from peft import LoraConfig
+from trl import SFTTrainer, SFTConfig
+import trackio
+dataset = load_dataset("trl-lib/Capybara", split="train")
+# Create train/eval split for monitoring
+dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset_split["train"],
+    eval_dataset=dataset_split["test"],
+    peft_config=LoraConfig(r=16, lora_alpha=32),
+    args=SFTConfig(
+        output_dir="my-model",
+        push_to_hub=True,
+        hub_model_id="username/my-model",
+        num_train_epochs=3,
+        eval_strategy="steps",
+        eval_steps=50,
+        report_to="trackio",
+        project="meaningful_prject_name", # project name for the training name (trackio)
+        run_name="meaningful_run_name",   # descriptive name for the specific training run (trackio)
+    )
+)
+trainer.train()
+trainer.push_to_hub()
+""",
+    "flavor": "a10g-large",
+    "timeout": "2h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control
+**When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`
+#### Working with Scripts
+⚠️ **Important:** The `script` parameter accepts either inline code (as shown above) OR a URL. **Local file paths do NOT work.**
+**Why local paths don't work:**
+Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:
+- Inline code (recommended for custom training)
+- Publicly accessible URLs
+- Private repo URLs (with HF_TOKEN)
+**Common mistakes:**
+```python
+# ❌ These will all fail
+hf_jobs("uv", {"script": "train.py"})
+hf_jobs("uv", {"script": "./scripts/train.py"})
+hf_jobs("uv", {"script": "/path/to/train.py"})
+```
+**Correct approaches:**
+```python
+# ✅ Inline code (recommended)
+hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
+# ✅ From Hugging Face Hub
+hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})
+# ✅ From GitHub
+hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})
+# ✅ From Gist
+hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})
+```
+**To use local scripts:** Upload to HF Hub first:
+```bash
+huggingface-cli repo create my-training-scripts --type model
+huggingface-cli upload my-training-scripts ./train.py train.py
+# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py
+```
+### Approach 2: TRL Maintained Scripts (Official Examples)
+TRL provides battle-tested scripts for all methods. Can be run from URLs:
+```python
+hf_jobs("uv", {
+    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
+    "script_args": [
+        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
+        "--dataset_name", "trl-lib/Capybara",
+        "--output_dir", "my-model",
+        "--push_to_hub",
+        "--hub_model_id", "username/my-model"
+    ],
+    "flavor": "a10g-large",
+    "timeout": "2h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**Benefits:** No code to write, maintained by TRL team, production-tested
+**When to use:** Standard TRL training, quick experiments, don't need custom code
+**Available:** Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts
+### Finding More UV Scripts on Hub
+The `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
+```python
+# Discover available UV script collections
+dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
+# Explore a specific collection
+hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
+```
+**Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation
+### Approach 3: HF Jobs CLI (Direct Terminal Commands)
+When the `hf_jobs()` MCP tool is unavailable, use the `hf jobs` CLI directly.
+**⚠️ CRITICAL: CLI Syntax Rules**
+```bash
+# ✅ CORRECT syntax - flags BEFORE script URL
+hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
+# ❌ WRONG - "run uv" instead of "uv run"
+hf jobs run uv "https://example.com/train.py" --flavor a10g-large
+# ❌ WRONG - flags AFTER script URL (will be ignored!)
+hf jobs uv run "https://example.com/train.py" --flavor a10g-large
+# ❌ WRONG - "--secret" instead of "--secrets" (plural)
+hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"
+```
+**Key syntax rules:**
+1. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)
+2. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL
+3. Use `--secrets` (plural), not `--secret`
+4. Script URL must be the last positional argument
+**Complete CLI example:**
+```bash
+hf jobs uv run \
+  --flavor a10g-large \
+  --timeout 2h \
+  --secrets HF_TOKEN \
+  "https://huggingface.co/user/repo/resolve/main/train.py"
+```
+**Check job status via CLI:**
+```bash
+hf jobs ps                        # List all jobs
+hf jobs logs <job-id>             # View logs
+hf jobs inspect <job-id>          # Job details
+hf jobs cancel <job-id>           # Cancel a job
+```
+### Approach 4: TRL Jobs Package (Simplified Training)
+The `trl-jobs` package provides optimized defaults and one-liner training.
+```bash
+# Install
+pip install trl-jobs
+# Train with SFT (simplest possible)
+trl-jobs sft \
+  --model_name Qwen/Qwen2.5-0.5B \
+  --dataset_name trl-lib/Capybara
+```
+**Benefits:** Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands
+**When to use:** User working in terminal directly (not Claude Code context), quick local experimentation
+**Repository:** https://github.com/huggingface/trl-jobs
+⚠️ **In Claude Code context, prefer using `hf_jobs()` MCP tool (Approach 1) when available.**
+## Hardware Selection
+| Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |
+|------------|---------------------|------------------|----------|
+| <1B params | `t4-small` | ~$0.75 | Demos, quick tests only without eval steps |
+| 1-3B params | `t4-medium`, `l4x1` | ~$1.50-2.50 | Development |
+| 3-7B params | `a10g-small`, `a10g-large` | ~$3.50-5.00 | Production training |
+| 7-13B params | `a10g-large`, `a100-large` | ~$5-10 | Large models (use LoRA) |
+| 13B+ params | `a100-large`, `a10g-largex2` | ~$10-20 | Very large (use LoRA) |
+**GPU Flavors:** cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
+**Guidelines:**
+- Use **LoRA/PEFT** for models >7B to reduce memory
+- Multi-GPU automatically handled by TRL/Accelerate
+- Start with smaller hardware for testing
+**See:** `references/hardware_guide.md` for detailed specifications
+## Critical: Saving Results to Hub
+**⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB**
+The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, **ALL TRAINING IS LOST**.
+### Required Configuration
+**In training script/config:**
+```python
+SFTConfig(
+    push_to_hub=True,
+    hub_model_id="username/model-name",  # MUST specify
+    hub_strategy="every_save",  # Optional: push checkpoints
+)
+```
+**In job submission:**
+```python
+{
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
+}
+```
+### Verification Checklist
+Before submitting:
+- [ ] `push_to_hub=True` set in config
+- [ ] `hub_model_id` includes username/repo-name
+- [ ] `secrets` parameter includes HF_TOKEN
+- [ ] User has write access to target repo
+**See:** `references/hub_saving.md` for detailed troubleshooting
+## Timeout Management
+**⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING**
+### Setting Timeouts
+```python
+{
+    "timeout": "2h"   # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
+}
+```
+### Timeout Guidelines
+| Scenario | Recommended | Notes |
+|----------|-------------|-------|
+| Quick demo (50-100 examples) | 10-30 min | Verify setup |
+| Development training | 1-2 hours | Small datasets |
+| Production (3-7B model) | 4-6 hours | Full datasets |
+| Large model with LoRA | 3-6 hours | Depends on dataset |
+**Always add 20-30% buffer** for model/dataset loading, checkpoint saving, Hub push operations, and network delays.
+**On timeout:** Job killed immediately, all unsaved progress lost, must restart from beginning
+## Cost Estimation
+**Offer to estimate cost when planning jobs with known parameters.** Use `scripts/estimate_cost.py`:
+```bash
+python scripts/estimate_cost.py \
+  --model meta-llama/Llama-2-7b-hf \
+  --dataset trl-lib/Capybara \
+  --hardware a10g-large \
+  --dataset-size 16000 \
+  --epochs 3
+```
+Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.
+**When to offer:** User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5
+## Example Training Scripts
+**Production-ready templates with all best practices:**
+Load these scripts for correctly:
+- **`scripts/train_sft_example.py`** - Complete SFT training with Trackio, LoRA, checkpoints
+- **`scripts/train_dpo_example.py`** - DPO training for preference learning
+- **`scripts/train_grpo_example.py`** - GRPO training for online RL
+These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to `hf_jobs()` or use as templates for custom scripts.
+## Monitoring and Tracking
+**Trackio** provides real-time metrics visualization. See `references/trackio_guide.md` for complete setup guide.
+**Key points:**
+- Add `trackio` to dependencies
+- Configure trainer with `report_to="trackio" and run_name="meaningful_name"`
+### Trackio Configuration Defaults
+**Use sensible defaults unless user specifies otherwise.** When generating training scripts with Trackio:
+**Default Configuration:**
+- **Space ID**: `{username}/trackio` (use "trackio" as default space name)
+- **Run naming**: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
+- **Config**: Keep minimal - only include hyperparameters and model/dataset info
+- **Project Name**: Use a Project Name to associate runs with a particular Project
+**User overrides:** If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.
+This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.
+See `references/trackio_guide.md` for complete documentation including grouping runs for experiments.
+### Check Job Status
+```python
+# List all jobs
+hf_jobs("ps")
+# Inspect specific job
+hf_jobs("inspect", {"job_id": "your-job-id"})
+# View logs
+hf_jobs("logs", {"job_id": "your-job-id"})
+```
+**Remember:** Wait for user to request status checks. Avoid polling repeatedly.
+## Dataset Validation
+**Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**
+### Why Validate
+- 50%+ of training failures are due to dataset format issues
+- DPO especially strict: requires exact column names (`prompt`, `chosen`, `rejected`)
+- Failed GPU jobs waste $1-10 and 30-60 minutes
+- Validation on CPU costs ~$0.01 and takes <1 minute
+### When to Validate
+**ALWAYS validate for:**
+- Unknown or custom datasets
+- DPO training (CRITICAL - 90% of datasets need mapping)
+- Any dataset not explicitly TRL-compatible
+**Skip validation for known TRL datasets:**
+- `trl-lib/ultrachat_200k`, `trl-lib/Capybara`, `HuggingFaceH4/ultrachat_200k`, etc.
+### Usage
+```python
+hf_jobs("uv", {
+    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
+    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
+})
+```
+The script is fast, and will usually complete synchronously.
+### Reading Results
+The output shows compatibility for each training method:
+- **`✓ READY`** - Dataset is compatible, use directly
+- **`✗ NEEDS MAPPING`** - Compatible but needs preprocessing (mapping code provided)
+- **`✗ INCOMPATIBLE`** - Cannot be used for this method
+When mapping is needed, the output includes a **"MAPPING CODE"** section with copy-paste ready Python code.
+### Example Workflow
+```python
+# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
+hf_jobs("uv", {
+    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
+    "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
+})
+# 2. Check output markers:
+#    ✓ READY → proceed with training
+#    ✗ NEEDS MAPPING → apply mapping code below
+#    ✗ INCOMPATIBLE → choose different method/dataset
+# 3. If mapping needed, apply before training:
+def format_for_dpo(example):
+    return {
+        'prompt': example['instruction'],
+        'chosen': example['chosen_response'],
+        'rejected': example['rejected_response'],
+    }
+dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
+# 4. Launch training job with confidence
+```
+### Common Scenario: DPO Format Mismatch
+Most DPO datasets use non-standard column names. Example:
+```
+Dataset has: instruction, chosen_response, rejected_response
+DPO expects: prompt, chosen, rejected
+```
+The validator detects this and provides exact mapping code to fix it.
+## Converting Models to GGUF
+After training, convert models to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
+**What is GGUF:**
+- Optimized for CPU/GPU inference with llama.cpp
+- Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
+- Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
+- Typically 2-8GB for 7B models (vs 14GB unquantized)
+**When to convert:**
+- Running models locally with Ollama or LM Studio
+- Reducing model size with quantization
+- Deploying to edge devices
+- Sharing models for local-first use
+**See:** `references/gguf_conversion.md` for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.
+**Quick conversion:**
+```python
+hf_jobs("uv", {
+    "script": "<see references/gguf_conversion.md for complete script>",
+    "flavor": "a10g-large",
+    "timeout": "45m",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
+    "env": {
+        "ADAPTER_MODEL": "username/my-finetuned-model",
+        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
+        "OUTPUT_REPO": "username/my-model-gguf"
+    }
+})
+```
+## Common Training Patterns
+See `references/training_patterns.md` for detailed examples including:
+- Quick demo (5-10 minutes)
+- Production with checkpoints
+- Multi-GPU training
+- DPO training (preference learning)
+- GRPO training (online RL)
+## Common Failure Modes
+### Out of Memory (OOM)
+**Fix (try in order):**
+1. Reduce batch size: `per_device_train_batch_size=1`, increase `gradient_accumulation_steps=8`. Effective batch size is `per_device_train_batch_size` x `gradient_accumulation_steps`. For best performance keep effective batch size close to 128.
+2. Enable: `gradient_checkpointing=True`
+3. Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.
+### Dataset Misformatted
+**Fix:**
+1. Validate first with dataset inspector:
+   ```bash
+   uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
+     --dataset name --split train
+   ```
+2. Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
+3. Apply mapping code from inspector output if needed
+### Job Timeout
+**Fix:**
+1. Check logs for actual runtime: `hf_jobs("logs", {"job_id": "..."})`
+2. Increase timeout with buffer: `"timeout": "3h"` (add 30% to estimated time)
+3. Or reduce training: lower `num_train_epochs`, use smaller dataset, enable `max_steps`
+4. Save checkpoints: `save_strategy="steps"`, `save_steps=500`, `hub_strategy="every_save"`
+**Note:** Default 30min is insufficient for real training. Minimum 1-2 hours.
+### Hub Push Failures
+**Fix:**
+1. Add to job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
+2. Add to config: `push_to_hub=True`, `hub_model_id="username/model-name"`
+3. Verify auth: `mcp__huggingface__hf_whoami()`
+4. Check token has write permissions and repo exists (or set `hub_private_repo=True`)
+### Missing Dependencies
+**Fix:**
+Add to PEP 723 header:
+```python
+# /// script
+# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
+# ///
+```
+## Troubleshooting
+**Common issues:**
+- Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
+- Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
+- Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
+- Dataset format error → Validate with dataset inspector (see Dataset Validation section)
+- Import/module errors → Add PEP 723 header with dependencies, verify format
+- Authentication errors → Check `mcp__huggingface__hf_whoami()`, token permissions, secrets parameter
+**See:** `references/troubleshooting.md` for complete troubleshooting guide
+## Resources
+### References (In This Skill)
+- `references/training_methods.md` - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
+- `references/training_patterns.md` - Common training patterns and examples
+- `references/gguf_conversion.md` - Complete GGUF conversion guide
+- `references/trackio_guide.md` - Trackio monitoring setup
+- `references/hardware_guide.md` - Hardware specs and selection
+- `references/hub_saving.md` - Hub authentication troubleshooting
+- `references/troubleshooting.md` - Common issues and solutions
+### Scripts (In This Skill)
+- `scripts/train_sft_example.py` - Production SFT template
+- `scripts/train_dpo_example.py` - Production DPO template
+- `scripts/train_grpo_example.py` - Production GRPO template
+- `scripts/estimate_cost.py` - Estimate time and cost (offer when appropriate)
+- `scripts/convert_to_gguf.py` - Complete GGUF conversion script
+### External Scripts
+- [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Validate dataset format before training (use via `uv run` or `hf_jobs`)
+### External Links
+- [TRL Documentation](https://huggingface.co/docs/trl)
+- [TRL Jobs Training Guide](https://huggingface.co/docs/trl/en/jobs_training)
+- [TRL Jobs Package](https://github.com/huggingface/trl-jobs)
+- [HF Jobs Documentation](https://huggingface.co/docs/huggingface_hub/guides/jobs)
+- [TRL Example Scripts](https://github.com/huggingface/trl/tree/main/examples/scripts)
+- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/)
+- [UV Scripts Organization](https://huggingface.co/uv-scripts)
+## Key Takeaways
+1. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests
+2. **Jobs are asynchronous** - Don't wait/poll; let user check when ready
+3. **Always set timeout** - Default 30 min is insufficient; minimum 1-2 hours recommended
+4. **Always enable Hub push** - Environment is ephemeral; without push, all results lost
+5. **Include Trackio** - Use example scripts as templates for real-time monitoring
+6. **Offer cost estimation** - When parameters are known, use `scripts/estimate_cost.py`
+7. **Use UV scripts (Approach 1)** - Default to `hf_jobs("uv", {...})` with inline scripts; TRL maintained scripts for standard training; avoid bash `trl-jobs` commands in Claude Code
+8. **Use hf_doc_fetch/hf_doc_search** for latest TRL documentation
+9. **Validate dataset format** before training with dataset inspector (see Dataset Validation section)
+10. **Choose appropriate hardware** for model size; use LoRA for models >7B

references/gguf_conversion.md ADDED Viewed

	@@ -0,0 +1,296 @@

+# GGUF Conversion Guide
+After training models with TRL on Hugging Face Jobs, convert them to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
+**This guide provides production-ready, tested code based on successful conversions.** All critical dependencies and build steps are included.
+## What is GGUF?
+**GGUF** (GPT-Generated Unified Format):
+- Optimized format for CPU/GPU inference with llama.cpp
+- Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
+- Compatible with: Ollama, LM Studio, Jan, GPT4All, llama.cpp
+- Typically 2-8GB for 7B models (vs 14GB unquantized)
+## When to Convert to GGUF
+**Convert when:**
+- Running models locally with Ollama or LM Studio
+- Using CPU-optimized inference
+- Reducing model size with quantization
+- Deploying to edge devices
+- Sharing models for local-first use
+## Critical Success Factors
+Based on production testing, these are **essential** for reliable conversion:
+### 1. ✅ Install Build Tools FIRST
+**Before cloning llama.cpp**, install build dependencies:
+```python
+subprocess.run(["apt-get", "update", "-qq"], check=True, capture_output=True)
+subprocess.run(["apt-get", "install", "-y", "-qq", "build-essential", "cmake"], check=True, capture_output=True)
+```
+**Why:** The quantization tool requires gcc and cmake. Installing after cloning doesn't help.
+### 2. ✅ Use CMake (Not Make)
+**Build the quantize tool with CMake:**
+```python
+# Create build directory
+os.makedirs("/tmp/llama.cpp/build", exist_ok=True)
+# Configure
+subprocess.run([
+    "cmake", "-B", "/tmp/llama.cpp/build", "-S", "/tmp/llama.cpp",
+    "-DGGML_CUDA=OFF"  # Faster build, CUDA not needed for quantization
+], check=True, capture_output=True, text=True)
+# Build
+subprocess.run([
+    "cmake", "--build", "/tmp/llama.cpp/build",
+    "--target", "llama-quantize", "-j", "4"
+], check=True, capture_output=True, text=True)
+# Binary path
+quantize_bin = "/tmp/llama.cpp/build/bin/llama-quantize"
+```
+**Why:** CMake is more reliable than `make` and produces consistent binary paths.
+### 3. ✅ Include All Dependencies
+**PEP 723 header must include:**
+```python
+# /// script
+# dependencies = [
+#     "transformers>=4.36.0",
+#     "peft>=0.7.0",
+#     "torch>=2.0.0",
+#     "accelerate>=0.24.0",
+#     "huggingface_hub>=0.20.0",
+#     "sentencepiece>=0.1.99",  # Required for tokenizer
+#     "protobuf>=3.20.0",        # Required for tokenizer
+#     "numpy",
+#     "gguf",
+# ]
+# ///
+```
+**Why:** `sentencepiece` and `protobuf` are critical for tokenizer conversion. Missing them causes silent failures.
+### 4. ✅ Verify Names Before Use
+**Always verify repos exist:**
+```python
+# Before submitting job, verify:
+hub_repo_details([ADAPTER_MODEL], repo_type="model")
+hub_repo_details([BASE_MODEL], repo_type="model")
+```
+**Why:** Non-existent dataset/model names cause job failures that could be caught in seconds.
+## Complete Conversion Script
+See `scripts/convert_to_gguf.py` for the complete, production-ready script.
+**Key features:**
+- ✅ All dependencies in PEP 723 header
+- ✅ Build tools installed automatically
+- ✅ CMake build process (reliable)
+- ✅ Comprehensive error handling
+- ✅ Environment variable configuration
+- ✅ Automatic README generation
+## Quick Conversion Job
+```python
+# Before submitting: VERIFY MODELS EXIST
+hub_repo_details(["username/my-finetuned-model"], repo_type="model")
+hub_repo_details(["Qwen/Qwen2.5-0.5B"], repo_type="model")
+# Submit conversion job
+hf_jobs("uv", {
+    "script": open("trl/scripts/convert_to_gguf.py").read(),  # Or inline the script
+    "flavor": "a10g-large",
+    "timeout": "45m",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
+    "env": {
+        "ADAPTER_MODEL": "username/my-finetuned-model",
+        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
+        "OUTPUT_REPO": "username/my-model-gguf",
+        "HF_USERNAME": "username"  # Optional, for README
+    }
+})
+```
+## Conversion Process
+The script performs these steps:
+1. **Load and Merge** - Load base model and LoRA adapter, merge them
+2. **Install Build Tools** - Install gcc, cmake (CRITICAL: before cloning llama.cpp)
+3. **Setup llama.cpp** - Clone repo, install Python dependencies
+4. **Convert to GGUF** - Create FP16 GGUF using llama.cpp converter
+5. **Build Quantize Tool** - Use CMake to build `llama-quantize`
+6. **Quantize** - Create Q4_K_M, Q5_K_M, Q8_0 versions
+7. **Upload** - Upload all versions + README to Hub
+## Quantization Options
+Common quantization formats (from smallest to largest):
+| Format | Size | Quality | Use Case |
+|--------|------|---------|----------|
+| **Q4_K_M** | ~300MB | Good | **Recommended** - best balance of size/quality |
+| **Q5_K_M** | ~350MB | Better | Higher quality, slightly larger |
+| **Q8_0** | ~500MB | Very High | Near-original quality |
+| **F16** | ~1GB | Original | Full precision, largest file |
+**Recommendation:** Create Q4_K_M, Q5_K_M, and Q8_0 versions to give users options.
+## Hardware Requirements
+**For conversion:**
+- Small models (<1B): CPU-basic works, but slow
+- Medium models (1-7B): a10g-large recommended
+- Large models (7B+): a10g-large or a100-large
+**Time estimates:**
+- 0.5B model: ~15-25 minutes on A10G
+- 3B model: ~30-45 minutes on A10G
+- 7B model: ~45-60 minutes on A10G
+## Using GGUF Models
+**GGUF models work on both CPU and GPU.** They're optimized for CPU inference but can also leverage GPU acceleration when available.
+### With Ollama (auto-detects GPU)
+```bash
+# Download GGUF
+huggingface-cli download username/my-model-gguf model-q4_k_m.gguf
+# Create Modelfile
+echo "FROM ./model-q4_k_m.gguf" > Modelfile
+# Create and run (uses GPU automatically if available)
+ollama create my-model -f Modelfile
+ollama run my-model
+```
+### With llama.cpp
+```bash
+# CPU only
+./llama-cli -m model-q4_k_m.gguf -p "Your prompt"
+# With GPU acceleration (offload 32 layers to GPU)
+./llama-cli -m model-q4_k_m.gguf -ngl 32 -p "Your prompt"
+```
+### With LM Studio
+1. Download the `.gguf` file
+2. Import into LM Studio
+3. Start chatting
+## Best Practices
+### ✅ DO:
+1. **Verify repos exist** before submitting jobs (use `hub_repo_details`)
+2. **Install build tools FIRST** before cloning llama.cpp
+3. **Use CMake** for building quantize tool (not make)
+4. **Include all dependencies** in PEP 723 header (especially sentencepiece, protobuf)
+5. **Create multiple quantizations** - Give users choice
+6. **Test on known models** before production use
+7. **Use A10G GPU** for faster conversion
+### ❌ DON'T:
+1. **Assume repos exist** - Always verify with hub tools
+2. **Use make** instead of CMake - Less reliable
+3. **Remove dependencies** to "simplify" - They're all needed
+4. **Skip build tools** - Quantization will fail silently
+5. **Use default paths** - CMake puts binaries in build/bin/
+## Common Issues
+### Out of memory during merge
+**Fix:**
+- Use larger GPU (a10g-large or a100-large)
+- Ensure `device_map="auto"` for automatic placement
+- Use `dtype=torch.float16` or `torch.bfloat16`
+### Conversion fails with architecture error
+**Fix:**
+- Ensure llama.cpp supports the model architecture
+- Check for standard architecture (Qwen, Llama, Mistral, etc.)
+- Update llama.cpp to latest: `git clone --depth 1 https://github.com/ggerganov/llama.cpp.git`
+- Check llama.cpp documentation for model support
+### Quantization fails
+**Fix:**
+- Verify build tools installed: `apt-get install build-essential cmake`
+- Use CMake (not make) to build quantize tool
+- Check binary path: `/tmp/llama.cpp/build/bin/llama-quantize`
+- Verify FP16 GGUF exists before quantizing
+### Missing sentencepiece error
+**Fix:**
+- Add to PEP 723 header: `"sentencepiece>=0.1.99", "protobuf>=3.20.0"`
+- Don't remove dependencies to "simplify" - all are required
+### Upload fails or times out
+**Fix:**
+- Large models (>2GB) need longer timeout: `"timeout": "1h"`
+- Upload quantized versions separately if needed
+- Check network/Hub status
+## Lessons Learned
+These are from production testing and real failures:
+### 1. Always Verify Before Use
+**Lesson:** Don't assume repos/datasets exist. Check first.
+```python
+# BEFORE submitting job
+hub_repo_details(["trl-lib/argilla-dpo-mix-7k"], repo_type="dataset")  # Would catch error
+```
+**Prevented failures:** Non-existent dataset names, typos in model names
+### 2. Prioritize Reliability Over Performance
+**Lesson:** Default to what's most likely to succeed.
+- Use CMake (not make) - more reliable
+- Disable CUDA in build - faster, not needed
+- Include all dependencies - don't "simplify"
+**Prevented failures:** Build failures, missing binaries
+### 3. Create Atomic, Self-Contained Scripts
+**Lesson:** Don't remove dependencies or steps. Scripts should work as a unit.
+- All dependencies in PEP 723 header
+- All build steps included
+- Clear error messages
+**Prevented failures:** Missing tokenizer libraries, build tool failures
+## References
+**In this skill:**
+- `scripts/convert_to_gguf.py` - Complete, production-ready script
+**External:**
+- [llama.cpp Repository](https://github.com/ggerganov/llama.cpp)
+- [GGUF Specification](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)
+- [Ollama Documentation](https://ollama.ai)
+- [LM Studio](https://lmstudio.ai)
+## Summary
+**Critical checklist for GGUF conversion:**
+- [ ] Verify adapter and base models exist on Hub
+- [ ] Use production script from `scripts/convert_to_gguf.py`
+- [ ] All dependencies in PEP 723 header (including sentencepiece, protobuf)
+- [ ] Build tools installed before cloning llama.cpp
+- [ ] CMake used for building quantize tool (not make)
+- [ ] Correct binary path: `/tmp/llama.cpp/build/bin/llama-quantize`
+- [ ] A10G GPU selected for reasonable conversion time
+- [ ] Timeout set to 45m minimum
+- [ ] HF_TOKEN in secrets for Hub upload
+**The script in `scripts/convert_to_gguf.py` incorporates all these lessons and has been tested successfully in production.**

references/hardware_guide.md ADDED Viewed

	@@ -0,0 +1,283 @@

+# Hardware Selection Guide
+Choosing the right hardware (flavor) is critical for cost-effective training.
+## Available Hardware
+### CPU
+- `cpu-basic` - Basic CPU, testing only
+- `cpu-upgrade` - Enhanced CPU
+**Use cases:** Dataset validation, preprocessing, testing scripts
+**Not recommended for training:** Too slow for any meaningful training
+### GPU Options
+| Flavor | GPU | Memory | Use Case | Cost/hour |
+|--------|-----|--------|----------|-----------|
+| `t4-small` | NVIDIA T4 | 16GB | <1B models, demos | ~$0.50-1 |
+| `t4-medium` | NVIDIA T4 | 16GB | 1-3B models, development | ~$1-2 |
+| `l4x1` | NVIDIA L4 | 24GB | 3-7B models, efficient training | ~$2-3 |
+| `l4x4` | 4x NVIDIA L4 | 96GB | Multi-GPU training | ~$8-12 |
+| `a10g-small` | NVIDIA A10G | 24GB | 3-7B models, production | ~$3-4 |
+| `a10g-large` | NVIDIA A10G | 24GB | 7-13B models | ~$4-6 |
+| `a10g-largex2` | 2x NVIDIA A10G | 48GB | Multi-GPU, large models | ~$8-12 |
+| `a10g-largex4` | 4x NVIDIA A10G | 96GB | Multi-GPU, very large models | ~$16-24 |
+| `a100-large` | NVIDIA A100 | 40GB | 13B+ models, fast training | ~$8-12 |
+### TPU Options
+| Flavor | Type | Use Case |
+|--------|------|----------|
+| `v5e-1x1` | TPU v5e | Small TPU workloads |
+| `v5e-2x2` | 4x TPU v5e | Medium TPU workloads |
+| `v5e-2x4` | 8x TPU v5e | Large TPU workloads |
+**Note:** TPUs require TPU-optimized code. Most TRL training uses GPUs.
+## Selection Guidelines
+### By Model Size
+**Tiny Models (<1B parameters)**
+- **Recommended:** `t4-small`
+- **Example:** Qwen2.5-0.5B, TinyLlama
+- **Batch size:** 4-8
+- **Training time:** 1-2 hours for 1K examples
+**Small Models (1-3B parameters)**
+- **Recommended:** `t4-medium` or `a10g-small`
+- **Example:** Qwen2.5-1.5B, Phi-2
+- **Batch size:** 2-4
+- **Training time:** 2-4 hours for 10K examples
+**Medium Models (3-7B parameters)**
+- **Recommended:** `a10g-small` or `a10g-large`
+- **Example:** Qwen2.5-7B, Mistral-7B
+- **Batch size:** 1-2 (or LoRA with 4-8)
+- **Training time:** 4-8 hours for 10K examples
+**Large Models (7-13B parameters)**
+- **Recommended:** `a10g-large` or `a100-large`
+- **Example:** Llama-3-8B, Mixtral-8x7B (with LoRA)
+- **Batch size:** 1 (full fine-tuning) or 2-4 (LoRA)
+- **Training time:** 6-12 hours for 10K examples
+- **Note:** Always use LoRA/PEFT
+**Very Large Models (13B+ parameters)**
+- **Recommended:** `a100-large` with LoRA
+- **Example:** Llama-3-13B, Llama-3-70B (LoRA only)
+- **Batch size:** 1-2 with LoRA
+- **Training time:** 8-24 hours for 10K examples
+- **Note:** Full fine-tuning not feasible, use LoRA/PEFT
+### By Budget
+**Minimal Budget (<$5 total)**
+- Use `t4-small`
+- Train on subset of data (100-500 examples)
+- Limit to 1-2 epochs
+- Use small model (<1B)
+**Small Budget ($5-20)**
+- Use `t4-medium` or `a10g-small`
+- Train on 1K-5K examples
+- 2-3 epochs
+- Model up to 3B parameters
+**Medium Budget ($20-50)**
+- Use `a10g-small` or `a10g-large`
+- Train on 5K-20K examples
+- 3-5 epochs
+- Model up to 7B parameters
+**Large Budget ($50-200)**
+- Use `a10g-large` or `a100-large`
+- Full dataset training
+- Multiple epochs
+- Model up to 13B parameters with LoRA
+### By Training Type
+**Quick Demo/Experiment**
+- `t4-small`
+- 50-100 examples
+- 5-10 steps
+- ~10-15 minutes
+**Development/Iteration**
+- `t4-medium` or `a10g-small`
+- 1K examples
+- 1 epoch
+- ~30-60 minutes
+**Production Training**
+- `a10g-large` or `a100-large`
+- Full dataset
+- 3-5 epochs
+- 4-12 hours
+**Research/Experimentation**
+- `a100-large`
+- Multiple runs
+- Various hyperparameters
+- Budget for 20-50 hours
+## Memory Considerations
+### Estimating Memory Requirements
+**Full fine-tuning:**
+```
+Memory (GB) ≈ (Model params in billions) × 20
+```
+**LoRA fine-tuning:**
+```
+Memory (GB) ≈ (Model params in billions) × 4
+```
+**Examples:**
+- Qwen2.5-0.5B full: ~10GB ✅ fits t4-small
+- Qwen2.5-1.5B full: ~30GB ❌ exceeds most GPUs
+- Qwen2.5-1.5B LoRA: ~6GB ✅ fits t4-small
+- Qwen2.5-7B full: ~140GB ❌ not feasible
+- Qwen2.5-7B LoRA: ~28GB ✅ fits a10g-large
+### Memory Optimization
+If hitting memory limits:
+1. **Use LoRA/PEFT**
+   ```python
+   peft_config=LoraConfig(r=16, lora_alpha=32)
+   ```
+2. **Reduce batch size**
+   ```python
+   per_device_train_batch_size=1
+   ```
+3. **Increase gradient accumulation**
+   ```python
+   gradient_accumulation_steps=8  # Effective batch size = 1×8
+   ```
+4. **Enable gradient checkpointing**
+   ```python
+   gradient_checkpointing=True
+   ```
+5. **Use mixed precision**
+   ```python
+   bf16=True  # or fp16=True
+   ```
+6. **Upgrade to larger GPU**
+   - t4 → a10g → a100
+## Cost Estimation
+### Formula
+```
+Total Cost = (Hours of training) × (Cost per hour)
+```
+### Example Calculations
+**Quick demo:**
+- Hardware: t4-small ($0.75/hour)
+- Time: 15 minutes (0.25 hours)
+- Cost: $0.19
+**Development training:**
+- Hardware: a10g-small ($3.50/hour)
+- Time: 2 hours
+- Cost: $7.00
+**Production training:**
+- Hardware: a10g-large ($5/hour)
+- Time: 6 hours
+- Cost: $30.00
+**Large model with LoRA:**
+- Hardware: a100-large ($10/hour)
+- Time: 8 hours
+- Cost: $80.00
+### Cost Optimization Tips
+1. **Start small:** Test on t4-small with subset
+2. **Use LoRA:** 4-5x cheaper than full fine-tuning
+3. **Optimize hyperparameters:** Fewer epochs if possible
+4. **Set appropriate timeout:** Don't waste compute on stalled jobs
+5. **Use checkpointing:** Resume if job fails
+6. **Monitor costs:** Check running jobs regularly
+## Multi-GPU Training
+TRL automatically handles multi-GPU training with Accelerate when using multi-GPU flavors.
+**Multi-GPU flavors:**
+- `l4x4` - 4x L4 GPUs
+- `a10g-largex2` - 2x A10G GPUs
+- `a10g-largex4` - 4x A10G GPUs
+**When to use:**
+- Models >13B parameters
+- Need faster training (linear speedup)
+- Large datasets (>50K examples)
+**Example:**
+```python
+hf_jobs("uv", {
+    "script": "train.py",
+    "flavor": "a10g-largex2",  # 2 GPUs
+    "timeout": "4h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+No code changes needed—TRL/Accelerate handles distribution automatically.
+## Choosing Between Options
+### a10g vs a100
+**Choose a10g when:**
+- Model <13B parameters
+- Budget conscious
+- Training time not critical
+**Choose a100 when:**
+- Model 13B+ parameters
+- Need fastest training
+- Memory requirements high
+- Budget allows
+### Single vs Multi-GPU
+**Choose single GPU when:**
+- Model <7B parameters
+- Budget constrained
+- Simpler debugging
+**Choose multi-GPU when:**
+- Model >13B parameters
+- Need faster training
+- Large batch sizes required
+- Cost-effective for large jobs
+## Quick Reference
+```python
+# Model size → Hardware selection
+HARDWARE_MAP = {
+    "<1B":     "t4-small",
+    "1-3B":    "a10g-small",
+    "3-7B":    "a10g-large",
+    "7-13B":   "a10g-large (LoRA) or a100-large",
+    ">13B":    "a100-large (LoRA required)"
+}
+```

references/hub_saving.md ADDED Viewed

	@@ -0,0 +1,364 @@

+# Saving Training Results to Hugging Face Hub
+**⚠️ CRITICAL:** Training environments are ephemeral. ALL results are lost when a job completes unless pushed to the Hub.
+## Why Hub Push is Required
+When running on Hugging Face Jobs:
+- Environment is temporary
+- All files deleted on job completion
+- No local disk persistence
+- Cannot access results after job ends
+**Without Hub push, training is completely wasted.**
+## Required Configuration
+### 1. Training Configuration
+In your SFTConfig or trainer config:
+```python
+SFTConfig(
+    push_to_hub=True,                    # Enable Hub push
+    hub_model_id="username/model-name",   # Target repository
+)
+```
+### 2. Job Configuration
+When submitting the job:
+```python
+hf_jobs("uv", {
+    "script": "train.py",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Provide authentication
+})
+```
+**The `$HF_TOKEN` placeholder is automatically replaced with your Hugging Face token.**
+## Complete Example
+```python
+# train.py
+# /// script
+# dependencies = ["trl"]
+# ///
+from trl import SFTTrainer, SFTConfig
+from datasets import load_dataset
+dataset = load_dataset("trl-lib/Capybara", split="train")
+# Configure with Hub push
+config = SFTConfig(
+    output_dir="my-model",
+    num_train_epochs=3,
+    # ✅ CRITICAL: Hub push configuration
+    push_to_hub=True,
+    hub_model_id="myusername/my-trained-model",
+    # Optional: Push strategy
+    push_to_hub_model_id="myusername/my-trained-model",
+    push_to_hub_organization=None,
+    push_to_hub_token=None,  # Uses environment token
+)
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,
+    args=config,
+)
+trainer.train()
+# ✅ Push final model
+trainer.push_to_hub()
+print("✅ Model saved to: https://huggingface.co/myusername/my-trained-model")
+```
+**Submit with authentication:**
+```python
+hf_jobs("uv", {
+    "script": "train.py",
+    "flavor": "a10g-large",
+    "timeout": "2h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ Required!
+})
+```
+## What Gets Saved
+When `push_to_hub=True`:
+1. **Model weights** - Final trained parameters
+2. **Tokenizer** - Associated tokenizer
+3. **Configuration** - Model config (config.json)
+4. **Training arguments** - Hyperparameters used
+5. **Model card** - Auto-generated documentation
+6. **Checkpoints** - If `save_strategy="steps"` enabled
+## Checkpoint Saving
+Save intermediate checkpoints during training:
+```python
+SFTConfig(
+    output_dir="my-model",
+    push_to_hub=True,
+    hub_model_id="username/my-model",
+    # Checkpoint configuration
+    save_strategy="steps",
+    save_steps=100,              # Save every 100 steps
+    save_total_limit=3,          # Keep only last 3 checkpoints
+)
+```
+**Benefits:**
+- Resume training if job fails
+- Compare checkpoint performance
+- Use intermediate models
+**Checkpoints are pushed to:** `username/my-model` (same repo)
+## Authentication Methods
+### Method 1: Automatic Token (Recommended)
+```python
+"secrets": {"HF_TOKEN": "$HF_TOKEN"}
+```
+Uses your logged-in Hugging Face token automatically.
+### Method 2: Explicit Token
+```python
+"secrets": {"HF_TOKEN": "hf_abc123..."}
+```
+Provide token explicitly (not recommended for security).
+### Method 3: Environment Variable
+```python
+"env": {"HF_TOKEN": "hf_abc123..."}
+```
+Pass as regular environment variable (less secure than secrets).
+**Always prefer Method 1** for security and convenience.
+## Verification Checklist
+Before submitting any training job, verify:
+- [ ] `push_to_hub=True` in training config
+- [ ] `hub_model_id` is specified (format: `username/model-name`)
+- [ ] `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config
+- [ ] Repository name doesn't conflict with existing repos
+- [ ] You have write access to the target namespace
+## Repository Setup
+### Automatic Creation
+If repository doesn't exist, it's created automatically when first pushing.
+### Manual Creation
+Create repository before training:
+```python
+from huggingface_hub import HfApi
+api = HfApi()
+api.create_repo(
+    repo_id="username/model-name",
+    repo_type="model",
+    private=False,  # or True for private repo
+)
+```
+### Repository Naming
+**Valid names:**
+- `username/my-model`
+- `username/model-name`
+- `organization/model-name`
+**Invalid names:**
+- `model-name` (missing username)
+- `username/model name` (spaces not allowed)
+- `username/MODEL` (uppercase discouraged)
+## Troubleshooting
+### Error: 401 Unauthorized
+**Cause:** HF_TOKEN not provided or invalid
+**Solutions:**
+1. Verify `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config
+2. Check you're logged in: `huggingface-cli whoami`
+3. Re-login: `huggingface-cli login`
+### Error: 403 Forbidden
+**Cause:** No write access to repository
+**Solutions:**
+1. Check repository namespace matches your username
+2. Verify you're a member of organization (if using org namespace)
+3. Check repository isn't private (if accessing org repo)
+### Error: Repository not found
+**Cause:** Repository doesn't exist and auto-creation failed
+**Solutions:**
+1. Manually create repository first
+2. Check repository name format
+3. Verify namespace exists
+### Error: Push failed during training
+**Cause:** Network issues or Hub unavailable
+**Solutions:**
+1. Training continues but final push fails
+2. Checkpoints may be saved
+3. Re-run push manually after job completes
+### Issue: Model saved but not visible
+**Possible causes:**
+1. Repository is private—check https://huggingface.co/username
+2. Wrong namespace—verify `hub_model_id` matches login
+3. Push still in progress—wait a few minutes
+## Manual Push After Training
+If training completes but push fails, push manually:
+```python
+from transformers import AutoModel, AutoTokenizer
+# Load from local checkpoint
+model = AutoModel.from_pretrained("./output_dir")
+tokenizer = AutoTokenizer.from_pretrained("./output_dir")
+# Push to Hub
+model.push_to_hub("username/model-name", token="hf_abc123...")
+tokenizer.push_to_hub("username/model-name", token="hf_abc123...")
+```
+**Note:** Only possible if job hasn't completed (files still exist).
+## Best Practices
+1. **Always enable `push_to_hub=True`**
+2. **Use checkpoint saving** for long training runs
+3. **Verify Hub push** in logs before job completes
+4. **Set appropriate `save_total_limit`** to avoid excessive checkpoints
+5. **Use descriptive repo names** (e.g., `qwen-capybara-sft` not `model1`)
+6. **Add model card** with training details
+7. **Tag models** with relevant tags (e.g., `text-generation`, `fine-tuned`)
+## Monitoring Push Progress
+Check logs for push progress:
+```python
+hf_jobs("logs", {"job_id": "your-job-id"})
+```
+**Look for:**
+```
+Pushing model to username/model-name...
+Upload file pytorch_model.bin: 100%
+✅ Model pushed successfully
+```
+## Example: Full Production Setup
+```python
+# production_train.py
+# /// script
+# dependencies = ["trl>=0.12.0", "peft>=0.7.0"]
+# ///
+from datasets import load_dataset
+from peft import LoraConfig
+from trl import SFTTrainer, SFTConfig
+import os
+# Verify token is available
+assert "HF_TOKEN" in os.environ, "HF_TOKEN not found in environment!"
+# Load dataset
+dataset = load_dataset("trl-lib/Capybara", split="train")
+print(f"✅ Dataset loaded: {len(dataset)} examples")
+# Configure with comprehensive Hub settings
+config = SFTConfig(
+    output_dir="qwen-capybara-sft",
+    # Hub configuration
+    push_to_hub=True,
+    hub_model_id="myusername/qwen-capybara-sft",
+    hub_strategy="checkpoint",  # Push checkpoints
+    # Checkpoint configuration
+    save_strategy="steps",
+    save_steps=100,
+    save_total_limit=3,
+    # Training settings
+    num_train_epochs=3,
+    per_device_train_batch_size=4,
+    # Logging
+    logging_steps=10,
+    logging_first_step=True,
+)
+# Train with LoRA
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,
+    args=config,
+    peft_config=LoraConfig(r=16, lora_alpha=32),
+)
+print("🚀 Starting training...")
+trainer.train()
+print("💾 Pushing final model to Hub...")
+trainer.push_to_hub()
+print("✅ Training complete!")
+print(f"Model available at: https://huggingface.co/myusername/qwen-capybara-sft")
+```
+**Submit:**
+```python
+hf_jobs("uv", {
+    "script": "production_train.py",
+    "flavor": "a10g-large",
+    "timeout": "6h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+## Key Takeaway
+**Without `push_to_hub=True` and `secrets={"HF_TOKEN": "$HF_TOKEN"}`, all training results are permanently lost.**
+Always verify both are configured before submitting any training job.

references/reliability_principles.md ADDED Viewed

	@@ -0,0 +1,371 @@

+# Reliability Principles for Training Jobs
+These principles are derived from real production failures and successful fixes. Following them prevents common failure modes and ensures reliable job execution.
+## Principle 1: Always Verify Before Use
+**Rule:** Never assume repos, datasets, or resources exist. Verify with tools first.
+### What It Prevents
+- **Non-existent datasets** - Jobs fail immediately when dataset doesn't exist
+- **Typos in names** - Simple mistakes like "argilla-dpo-mix-7k" vs "ultrafeedback_binarized"
+- **Incorrect paths** - Old or moved repos, renamed files
+- **Missing dependencies** - Undocumented requirements
+### How to Apply
+**Before submitting ANY job:**
+```python
+# Verify dataset exists
+dataset_search({"query": "dataset-name", "author": "author-name", "limit": 5})
+hub_repo_details(["author/dataset-name"], repo_type="dataset")
+# Verify model exists
+hub_repo_details(["org/model-name"], repo_type="model")
+# Check script/file paths (for URL-based scripts)
+# Verify before using: https://github.com/user/repo/blob/main/script.py
+```
+**Examples that would have caught errors:**
+```python
+# ❌ WRONG: Assumed dataset exists
+hf_jobs("uv", {
+    "script": """...""",
+    "env": {"DATASET": "trl-lib/argilla-dpo-mix-7k"}  # Doesn't exist!
+})
+# ✅ CORRECT: Verify first
+dataset_search({"query": "argilla dpo", "author": "trl-lib"})
+# Would show: "trl-lib/ultrafeedback_binarized" is the correct name
+hub_repo_details(["trl-lib/ultrafeedback_binarized"], repo_type="dataset")
+# Confirms it exists before using
+```
+### Implementation Checklist
+- [ ] Check dataset exists before training
+- [ ] Verify base model exists before fine-tuning
+- [ ] Confirm adapter model exists before GGUF conversion
+- [ ] Test script URLs are valid before submitting
+- [ ] Validate file paths in repositories
+- [ ] Check for recent updates/renames of resources
+**Time cost:** 5-10 seconds
+**Time saved:** Hours of failed job time + debugging
+---
+## Principle 2: Prioritize Reliability Over Performance
+**Rule:** Default to what is most likely to succeed, not what is theoretically fastest.
+### What It Prevents
+- **Hardware incompatibilities** - Features that fail on certain GPUs
+- **Unstable optimizations** - Speed-ups that cause crashes
+- **Complex configurations** - More failure points
+- **Build system issues** - Unreliable compilation methods
+### How to Apply
+**Choose reliability:**
+```python
+# ❌ RISKY: Aggressive optimization that may fail
+SFTConfig(
+    torch_compile=True,  # Can fail on T4, A10G GPUs
+    optim="adamw_bnb_8bit",  # Requires specific setup
+    fp16=False,  # May cause training instability
+    ...
+)
+# ✅ SAFE: Proven defaults
+SFTConfig(
+    # torch_compile=True,  # Commented with note: "Enable on H100 for 20% speedup"
+    optim="adamw_torch",  # Standard, always works
+    fp16=True,  # Stable and fast
+    ...
+)
+```
+**For build processes:**
+```python
+# ❌ UNRELIABLE: Uses make (platform-dependent)
+subprocess.run(["make", "-C", "/tmp/llama.cpp", "llama-quantize"], check=True)
+# ✅ RELIABLE: Uses CMake (consistent, documented)
+subprocess.run([
+    "cmake", "-B", "/tmp/llama.cpp/build", "-S", "/tmp/llama.cpp",
+    "-DGGML_CUDA=OFF"  # Disable CUDA for faster, more reliable build
+], check=True)
+subprocess.run([
+    "cmake", "--build", "/tmp/llama.cpp/build",
+    "--target", "llama-quantize", "-j", "4"
+], check=True)
+```
+### Real-World Example
+**The `torch.compile` failure:**
+- Added for "20% speedup" on H100
+- **Failed fatally on T4-medium** with cryptic error
+- Misdiagnosed as dataset issue (cost hours)
+- **Fix:** Disable by default, add as optional comment
+**Result:** Reliability > 20% performance gain
+### Implementation Checklist
+- [ ] Use proven, standard configurations by default
+- [ ] Comment out performance optimizations with hardware notes
+- [ ] Use stable build systems (CMake > make)
+- [ ] Test on target hardware before production
+- [ ] Document known incompatibilities
+- [ ] Provide "safe" and "fast" variants when needed
+**Performance loss:** 10-20% in best case
+**Reliability gain:** 95%+ success rate vs 60-70%
+---
+## Principle 3: Create Atomic, Self-Contained Scripts
+**Rule:** Scripts should work as complete, independent units. Don't remove parts to "simplify."
+### What It Prevents
+- **Missing dependencies** - Removed "unnecessary" packages that are actually required
+- **Incomplete processes** - Skipped steps that seem redundant
+- **Environment assumptions** - Scripts that need pre-setup
+- **Partial failures** - Some parts work, others fail silently
+### How to Apply
+**Complete dependency specifications:**
+```python
+# ❌ INCOMPLETE: "Simplified" by removing dependencies
+# /// script
+# dependencies = [
+#     "transformers",
+#     "peft",
+#     "torch",
+# ]
+# ///
+# ✅ COMPLETE: All dependencies explicit
+# /// script
+# dependencies = [
+#     "transformers>=4.36.0",
+#     "peft>=0.7.0",
+#     "torch>=2.0.0",
+#     "accelerate>=0.24.0",
+#     "huggingface_hub>=0.20.0",
+#     "sentencepiece>=0.1.99",  # Required for tokenizers
+#     "protobuf>=3.20.0",        # Required for tokenizers
+#     "numpy",
+#     "gguf",
+# ]
+# ///
+```
+**Complete build processes:**
+```python
+# ❌ INCOMPLETE: Assumes build tools exist
+subprocess.run(["git", "clone", "https://github.com/ggerganov/llama.cpp.git", "/tmp/llama.cpp"])
+subprocess.run(["make", "-C", "/tmp/llama.cpp", "llama-quantize"])  # FAILS: no gcc/make
+# ✅ COMPLETE: Installs all requirements
+subprocess.run(["apt-get", "update", "-qq"], check=True)
+subprocess.run(["apt-get", "install", "-y", "-qq", "build-essential", "cmake"], check=True)
+subprocess.run(["git", "clone", "https://github.com/ggerganov/llama.cpp.git", "/tmp/llama.cpp"])
+# ... then build
+```
+### Real-World Example
+**The `sentencepiece` failure:**
+- Original script had it: worked fine
+- "Simplified" version removed it: "doesn't look necessary"
+- **GGUF conversion failed silently** - tokenizer couldn't convert
+- Hard to debug: no obvious error message
+- **Fix:** Restore all original dependencies
+**Result:** Don't remove dependencies without thorough testing
+### Implementation Checklist
+- [ ] All dependencies in PEP 723 header with version pins
+- [ ] All system packages installed by script
+- [ ] No assumptions about pre-existing environment
+- [ ] No "optional" steps that are actually required
+- [ ] Test scripts in clean environment
+- [ ] Document why each dependency is needed
+**Complexity:** Slightly longer scripts
+**Reliability:** Scripts "just work" every time
+---
+## Principle 4: Provide Clear Error Context
+**Rule:** When things fail, make it obvious what went wrong and how to fix it.
+### How to Apply
+**Wrap subprocess calls:**
+```python
+# ❌ UNCLEAR: Silent failure
+subprocess.run([...], check=True, capture_output=True)
+# ✅ CLEAR: Shows what failed
+try:
+    result = subprocess.run(
+        [...],
+        check=True,
+        capture_output=True,
+        text=True
+    )
+    print(result.stdout)
+    if result.stderr:
+        print("Warnings:", result.stderr)
+except subprocess.CalledProcessError as e:
+    print(f"❌ Command failed!")
+    print("STDOUT:", e.stdout)
+    print("STDERR:", e.stderr)
+    raise
+```
+**Validate inputs:**
+```python
+# ❌ UNCLEAR: Fails later with cryptic error
+model = load_model(MODEL_NAME)
+# ✅ CLEAR: Fails fast with clear message
+if not MODEL_NAME:
+    raise ValueError("MODEL_NAME environment variable not set!")
+print(f"Loading model: {MODEL_NAME}")
+try:
+    model = load_model(MODEL_NAME)
+    print(f"✅ Model loaded successfully")
+except Exception as e:
+    print(f"❌ Failed to load model: {MODEL_NAME}")
+    print(f"Error: {e}")
+    print("Hint: Check that model exists on Hub")
+    raise
+```
+### Implementation Checklist
+- [ ] Wrap external calls with try/except
+- [ ] Print stdout/stderr on failure
+- [ ] Validate environment variables early
+- [ ] Add progress indicators (✅, ❌, 🔄)
+- [ ] Include hints for common failures
+- [ ] Log configuration at start
+---
+## Principle 5: Test the Happy Path on Known-Good Inputs
+**Rule:** Before using new code in production, test with inputs you know work.
+### How to Apply
+**Known-good test inputs:**
+```python
+# For training
+TEST_DATASET = "trl-lib/Capybara"  # Small, well-formatted, widely used
+TEST_MODEL = "Qwen/Qwen2.5-0.5B"  # Small, fast, reliable
+# For GGUF conversion
+TEST_ADAPTER = "evalstate/qwen-capybara-medium"  # Known working model
+TEST_BASE = "Qwen/Qwen2.5-0.5B"  # Compatible base
+```
+**Testing workflow:**
+1. Test with known-good inputs first
+2. If that works, try production inputs
+3. If production fails, you know it's the inputs (not code)
+4. Isolate the difference
+### Implementation Checklist
+- [ ] Maintain list of known-good test models/datasets
+- [ ] Test new scripts with test inputs first
+- [ ] Document what makes inputs "good"
+- [ ] Keep test jobs cheap (small models, short timeouts)
+- [ ] Only move to production after test succeeds
+**Time cost:** 5-10 minutes for test run
+**Debugging time saved:** Hours
+---
+## Summary: The Reliability Checklist
+Before submitting ANY job:
+### Pre-Flight Checks
+- [ ] **Verified** all repos/datasets exist (hub_repo_details)
+- [ ] **Tested** with known-good inputs if new code
+- [ ] **Using** proven hardware/configuration
+- [ ] **Included** all dependencies in PEP 723 header
+- [ ] **Installed** system requirements (build tools, etc.)
+- [ ] **Set** appropriate timeout (not default 30m)
+- [ ] **Configured** Hub push with HF_TOKEN
+- [ ] **Added** clear error handling
+### Script Quality
+- [ ] Self-contained (no external setup needed)
+- [ ] Complete dependencies listed
+- [ ] Build tools installed by script
+- [ ] Progress indicators included
+- [ ] Error messages are clear
+- [ ] Configuration logged at start
+### Job Configuration
+- [ ] Timeout > expected runtime + 30% buffer
+- [ ] Hardware appropriate for model size
+- [ ] Secrets include HF_TOKEN
+- [ ] Environment variables set correctly
+- [ ] Cost estimated and acceptable
+**Following these principles transforms job success rate from ~60-70% to ~95%+**
+---
+## When Principles Conflict
+Sometimes reliability and performance conflict. Here's how to choose:
+| Scenario | Choose | Rationale |
+|----------|--------|-----------|
+| Demo/test | Reliability | Fast failure is worse than slow success |
+| Production (first run) | Reliability | Prove it works before optimizing |
+| Production (proven) | Performance | Safe to optimize after validation |
+| Time-critical | Reliability | Failures cause more delay than slow runs |
+| Cost-critical | Balanced | Test with small model, then optimize |
+**General rule:** Reliability first, optimize second.
+---
+## Further Reading
+- `troubleshooting.md` - Common issues and fixes
+- `training_patterns.md` - Proven training configurations
+- `gguf_conversion.md` - Production GGUF workflow

references/trackio_guide.md ADDED Viewed

	@@ -0,0 +1,189 @@

+# Trackio Integration for TRL Training
+**Trackio** is an experiment tracking library that provides real-time metrics visualization for remote training on Hugging Face Jobs infrastructure.
+⚠️ **IMPORTANT**: For Jobs training (remote cloud GPUs):
+- Training happens on ephemeral cloud runners (not your local machine)
+- Trackio syncs metrics to a Hugging Face Space for real-time monitoring
+- Without a Space, metrics are lost when the job completes
+- The Space dashboard persists your training metrics permanently
+## Setting Up Trackio for Jobs
+**Step 1: Add trackio dependency**
+```python
+# /// script
+# dependencies = [
+#     "trl>=0.12.0",
+#     "trackio",  # Required!
+# ]
+# ///
+```
+**Step 2: Create a Trackio Space (one-time setup)**
+**Option A: Let Trackio auto-create (Recommended)**
+Pass a `space_id` to `trackio.init()` and Trackio will automatically create the Space if it doesn't exist.
+**Option B: Create manually**
+- Create Space via Hub UI at https://huggingface.co/new-space
+- Select Gradio SDK
+- OR use command: `huggingface-cli repo create my-trackio-dashboard --type space --space_sdk gradio`
+**Step 3: Initialize Trackio with space_id**
+```python
+import trackio
+trackio.init(
+    project="my-training",
+    space_id="username/trackio",  # CRITICAL for Jobs! Replace 'username' with your HF username
+    config={
+        "model": "Qwen/Qwen2.5-0.5B",
+        "dataset": "trl-lib/Capybara",
+        "learning_rate": 2e-5,
+    }
+)
+```
+**Step 4: Configure TRL to use Trackio**
+```python
+SFTConfig(
+    report_to="trackio",
+    # ... other config
+)
+```
+**Step 5: Finish tracking**
+```python
+trainer.train()
+trackio.finish()  # Ensures final metrics are synced
+```
+## What Trackio Tracks
+Trackio automatically logs:
+- ✅ Training loss
+- ✅ Learning rate
+- ✅ GPU utilization
+- ✅ Memory usage
+- ✅ Training throughput
+- ✅ Custom metrics
+## How It Works with Jobs
+1. **Training runs** → Metrics logged to local SQLite DB
+2. **Every 5 minutes** → Trackio syncs DB to HF Dataset (Parquet)
+3. **Space dashboard** → Reads from Dataset, displays metrics in real-time
+4. **Job completes** → Final sync ensures all metrics persisted
+## Default Configuration Pattern
+**Use sensible defaults for trackio configuration unless user requests otherwise.**
+### Recommended Defaults
+```python
+import trackio
+trackio.init(
+    project="qwen-capybara-sft",
+    name="baseline-run",             # Descriptive name user will recognize
+    space_id="username/trackio",     # Default space: {username}/trackio
+    config={
+        # Keep config minimal - hyperparameters and model/dataset info only
+        "model": "Qwen/Qwen2.5-0.5B",
+        "dataset": "trl-lib/Capybara",
+        "learning_rate": 2e-5,
+        "num_epochs": 3,
+    }
+)
+```
+**Key principles:**
+- **Space ID**: Use `{username}/trackio` with "trackio" as default space name
+- **Run naming**: Unless otherwise specified, name the run in a way the user will recognize
+- **Config**: Keep minimal - don't automatically capture job metadata unless requested
+- **Grouping**: Optional - only use if user requests organizing related experiments
+## Grouping Runs (Optional)
+The `group` parameter helps organize related runs together in the dashboard sidebar. This is useful when user is running multiple experiments with different configurations but wants to compare them together:
+```python
+# Example: Group runs by experiment type
+trackio.init(project="my-project", run_name="baseline-run-1", group="baseline")
+trackio.init(project="my-project", run_name="augmented-run-1", group="augmented")
+trackio.init(project="my-project", run_name="tuned-run-1", group="tuned")
+```
+Runs with the same group name can be grouped together in the sidebar, making it easier to compare related experiments. You can group by any configuration parameter:
+```python
+# Hyperparameter sweep - group by learning rate
+trackio.init(project="hyperparam-sweep", run_name="lr-0.001-run", group="lr_0.001")
+trackio.init(project="hyperparam-sweep", run_name="lr-0.01-run", group="lr_0.01")
+```
+## Environment Variables for Jobs
+You can configure trackio using environment variables instead of passing parameters to `trackio.init()`. This is useful for managing configuration across multiple jobs.
+**`HF_TOKEN`**
+Required for creating Spaces and writing to datasets (passed via `secrets`):
+```python
+hf_jobs("uv", {
+    "script": "...",
+    "secrets": {
+        "HF_TOKEN": "$HF_TOKEN"  # Enables Space creation and Hub push
+    }
+})
+```
+### Example with Environment Variables
+```python
+hf_jobs("uv", {
+    "script": """
+# Training script - trackio config from environment
+import trackio
+from datetime import datetime
+# Auto-generate run name
+timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M")
+run_name = f"sft_qwen25_{timestamp}"
+# Project and space_id can come from environment variables
+trackio.init(run_name=run_name, group="SFT")
+# ... training code ...
+trackio.finish()
+""",
+    "flavor": "a10g-large",
+    "timeout": "2h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**When to use environment variables:**
+- Managing multiple jobs with same configuration
+- Keeping training scripts portable across projects
+- Separating configuration from code
+**When to use direct parameters:**
+- Single job with specific configuration
+- When clarity in code is preferred
+- When each job has different project/space
+## Viewing the Dashboard
+After starting training:
+1. Navigate to the Space: `https://huggingface.co/spaces/username/trackio`
+2. The Gradio dashboard shows all tracked experiments
+3. Filter by project, compare runs, view charts with smoothing
+## Recommendation
+- **Trackio**: Best for real-time monitoring during long training runs
+- **Weights & Biases**: Best for team collaboration, requires account

references/training_methods.md ADDED Viewed

	@@ -0,0 +1,150 @@

+# TRL Training Methods Overview
+TRL (Transformer Reinforcement Learning) provides multiple training methods for fine-tuning and aligning language models. This reference provides a brief overview of each method.
+## Supervised Fine-Tuning (SFT)
+**What it is:** Standard instruction tuning with supervised learning on demonstration data.
+**When to use:**
+- Initial fine-tuning of base models on task-specific data
+- Teaching new capabilities or domains
+- Most common starting point for fine-tuning
+**Dataset format:** Conversational format with "messages" field, OR text field, OR prompt/completion pairs
+**Example:**
+```python
+from trl import SFTTrainer, SFTConfig
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,
+    args=SFTConfig(
+        output_dir="my-model",
+        push_to_hub=True,
+        hub_model_id="username/my-model",
+        eval_strategy="no",  # Disable eval for simple example
+        # max_length=1024 is the default - only set if you need different length
+    )
+)
+trainer.train()
+```
+**Note:** For production training with evaluation monitoring, see `scripts/train_sft_example.py`
+**Documentation:** `hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")`
+## Direct Preference Optimization (DPO)
+**What it is:** Alignment method that trains directly on preference pairs (chosen vs rejected responses) without requiring a reward model.
+**When to use:**
+- Aligning models to human preferences
+- Improving response quality after SFT
+- Have paired preference data (chosen/rejected responses)
+**Dataset format:** Preference pairs with "chosen" and "rejected" fields
+**Example:**
+```python
+from trl import DPOTrainer, DPOConfig
+trainer = DPOTrainer(
+    model="Qwen/Qwen2.5-0.5B-Instruct",  # Use instruct model
+    train_dataset=dataset,
+    args=DPOConfig(
+        output_dir="dpo-model",
+        beta=0.1,  # KL penalty coefficient
+        eval_strategy="no",  # Disable eval for simple example
+        # max_length=1024 is the default - only set if you need different length
+    )
+)
+trainer.train()
+```
+**Note:** For production training with evaluation monitoring, see `scripts/train_dpo_example.py`
+**Documentation:** `hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")`
+## Group Relative Policy Optimization (GRPO)
+**What it is:** Online RL method that optimizes relative to group performance, useful for tasks with verifiable rewards.
+**When to use:**
+- Tasks with automatic reward signals (code execution, math verification)
+- Online learning scenarios
+- When DPO offline data is insufficient
+**Dataset format:** Prompt-only format (model generates responses, reward computed online)
+**Example:**
+```python
+# Use TRL maintained script
+hf_jobs("uv", {
+    "script": "https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py",
+    "script_args": [
+        "--model_name_or_path", "Qwen/Qwen2.5-0.5B-Instruct",
+        "--dataset_name", "trl-lib/math_shepherd",
+        "--output_dir", "grpo-model"
+    ],
+    "flavor": "a10g-large",
+    "timeout": "4h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**Documentation:** `hf_doc_fetch("https://huggingface.co/docs/trl/grpo_trainer")`
+## Reward Modeling
+**What it is:** Train a reward model to score responses, used as a component in RLHF pipelines.
+**When to use:**
+- Building RLHF pipeline
+- Need automatic quality scoring
+- Creating reward signals for PPO training
+**Dataset format:** Preference pairs with "chosen" and "rejected" responses
+**Documentation:** `hf_doc_fetch("https://huggingface.co/docs/trl/reward_trainer")`
+## Method Selection Guide
+| Method | Complexity | Data Required | Use Case |
+|--------|-----------|---------------|----------|
+| **SFT** | Low | Demonstrations | Initial fine-tuning |
+| **DPO** | Medium | Paired preferences | Post-SFT alignment |
+| **GRPO** | Medium | Prompts + reward fn | Online RL with automatic rewards |
+| **Reward** | Medium | Paired preferences | Building RLHF pipeline |
+## Recommended Pipeline
+**For most use cases:**
+1. **Start with SFT** - Fine-tune base model on task data
+2. **Follow with DPO** - Align to preferences using paired data
+3. **Optional: GGUF conversion** - Deploy for local inference
+**For advanced RL scenarios:**
+1. **Start with SFT** - Fine-tune base model
+2. **Train reward model** - On preference data
+## Dataset Format Reference
+For complete dataset format specifications, use:
+```python
+hf_doc_fetch("https://huggingface.co/docs/trl/dataset_formats")
+```
+Or validate your dataset:
+```bash
+uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
+  --dataset your/dataset --split train
+```
+## See Also
+- `references/training_patterns.md` - Common training patterns and examples
+- `scripts/train_sft_example.py` - Complete SFT template
+- `scripts/train_dpo_example.py` - Complete DPO template
+- [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Dataset format validation tool

references/training_patterns.md ADDED Viewed

	@@ -0,0 +1,203 @@

+# Common Training Patterns
+This guide provides common training patterns and use cases for TRL on Hugging Face Jobs.
+## Multi-GPU Training
+Automatic distributed training across multiple GPUs. TRL/Accelerate handles distribution automatically:
+```python
+hf_jobs("uv", {
+    "script": """
+# Your training script here (same as single GPU)
+# No changes needed - Accelerate detects multiple GPUs
+""",
+    "flavor": "a10g-largex2",  # 2x A10G GPUs
+    "timeout": "4h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**Tips for multi-GPU:**
+- No code changes needed
+- Use `per_device_train_batch_size` (per GPU, not total)
+- Effective batch size = `per_device_train_batch_size` × `num_gpus` × `gradient_accumulation_steps`
+- Monitor GPU utilization to ensure both GPUs are being used
+## DPO Training (Preference Learning)
+Train with preference data for alignment:
+```python
+hf_jobs("uv", {
+    "script": """
+# /// script
+# dependencies = ["trl>=0.12.0", "trackio"]
+# ///
+from datasets import load_dataset
+from trl import DPOTrainer, DPOConfig
+import trackio
+dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
+# Create train/eval split
+dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
+config = DPOConfig(
+    output_dir="dpo-model",
+    push_to_hub=True,
+    hub_model_id="username/dpo-model",
+    num_train_epochs=1,
+    beta=0.1,  # KL penalty coefficient
+    eval_strategy="steps",
+    eval_steps=50,
+    report_to="trackio",
+    run_name="baseline_run", # use a meaningful run name
+    # max_length=1024,  # Default - only set if you need different sequence length
+)
+trainer = DPOTrainer(
+    model="Qwen/Qwen2.5-0.5B-Instruct",  # Use instruct model as base
+    train_dataset=dataset_split["train"],
+    eval_dataset=dataset_split["test"],  # IMPORTANT: Provide eval_dataset when eval_strategy is enabled
+    args=config,
+)
+trainer.train()
+trainer.push_to_hub()
+trackio.finish()
+""",
+    "flavor": "a10g-large",
+    "timeout": "3h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**For DPO documentation:** Use `hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")`
+## GRPO Training (Online RL)
+Group Relative Policy Optimization for online reinforcement learning:
+```python
+hf_jobs("uv", {
+    "script": "https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py",
+    "script_args": [
+        "--model_name_or_path", "Qwen/Qwen2.5-0.5B-Instruct",
+        "--dataset_name", "trl-lib/math_shepherd",
+        "--output_dir", "grpo-model",
+        "--push_to_hub",
+        "--hub_model_id", "username/grpo-model"
+    ],
+    "flavor": "a10g-large",
+    "timeout": "4h",
+    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
+})
+```
+**For GRPO documentation:** Use `hf_doc_fetch("https://huggingface.co/docs/trl/grpo_trainer")`
+## Trackio Configuration
+**Use sensible defaults for trackio setup.** See `references/trackio_guide.md` for complete documentation including grouping runs for experiments.
+### Basic Pattern
+```python
+import trackio
+trackio.init(
+    project="my-training",
+    run_name="baseline-run",             # Descriptive name user will recognize
+    space_id="username/trackio",     # Default space: {username}/trackio
+    config={
+        # Keep config minimal - hyperparameters and model/dataset info only
+        "model": "Qwen/Qwen2.5-0.5B",
+        "dataset": "trl-lib/Capybara",
+        "learning_rate": 2e-5,
+    }
+)
+# Your training code...
+trackio.finish()
+```
+### Grouping for Experiments (Optional)
+When user wants to compare related runs, use the `group` parameter:
+```python
+# Hyperparameter sweep
+trackio.init(project="hyperparam-sweep", run_name="lr-0.001", group="lr_0.001")
+trackio.init(project="hyperparam-sweep", run_name="lr-0.01", group="lr_0.01")
+```
+## Pattern Selection Guide
+| Use Case | Pattern | Hardware | Time |
+|----------|---------|----------|------|
+| SFT training | `scripts/train_sft_example.py` | a10g-large | 2-6 hours |
+| Large dataset (>10K) | Multi-GPU | a10g-largex2 | 4-12 hours |
+| Preference learning | DPO Training | a10g-large | 2-4 hours |
+| Online RL | GRPO Training | a10g-large | 3-6 hours |
+## Critical: Evaluation Dataset Requirements
+**⚠️ IMPORTANT**: If you set `eval_strategy="steps"` or `eval_strategy="epoch"`, you **MUST** provide an `eval_dataset` to the trainer, or the training will hang.
+### ✅ CORRECT - With eval dataset:
+```python
+dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset_split["train"],
+    eval_dataset=dataset_split["test"],  # ← MUST provide when eval_strategy is enabled
+    args=SFTConfig(eval_strategy="steps", ...),
+)
+```
+### ❌ WRONG - Will hang:
+```python
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,
+    # NO eval_dataset but eval_strategy="steps" ← WILL HANG
+    args=SFTConfig(eval_strategy="steps", ...),
+)
+```
+### Option: Disable evaluation if no eval dataset
+```python
+config = SFTConfig(
+    eval_strategy="no",  # ← Explicitly disable evaluation
+    # ... other config
+)
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,
+    # No eval_dataset needed
+    args=config,
+)
+```
+## Best Practices
+1. **Use train/eval splits** - Create evaluation split for monitoring progress
+2. **Enable Trackio** - Monitor progress in real-time
+3. **Add 20-30% buffer to timeout** - Account for loading/saving overhead
+4. **Test with TRL official scripts first** - Use maintained examples before custom code
+5. **Always provide eval_dataset** - When using eval_strategy, or set to "no"
+6. **Use multi-GPU for large models** - 7B+ models benefit significantly
+## See Also
+- `scripts/train_sft_example.py` - Complete SFT template with Trackio and eval split
+- `scripts/train_dpo_example.py` - Complete DPO template
+- `scripts/train_grpo_example.py` - Complete GRPO template
+- `references/hardware_guide.md` - Detailed hardware specifications
+- `references/training_methods.md` - Overview of all TRL training methods
+- `references/troubleshooting.md` - Common issues and solutions

references/troubleshooting.md ADDED Viewed

	@@ -0,0 +1,282 @@

+# Troubleshooting TRL Training Jobs
+Common issues and solutions when training with TRL on Hugging Face Jobs.
+## Training Hangs at "Starting training..." Step
+**Problem:** Job starts but hangs at the training step - never progresses, never times out, just sits there.
+**Root Cause:** Using `eval_strategy="steps"` or `eval_strategy="epoch"` without providing an `eval_dataset` to the trainer.
+**Solution:**
+**Option A: Provide eval_dataset (recommended)**
+```python
+# Create train/eval split
+dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset_split["train"],
+    eval_dataset=dataset_split["test"],  # ← MUST provide when eval_strategy is enabled
+    args=SFTConfig(
+        eval_strategy="steps",
+        eval_steps=50,
+        ...
+    ),
+)
+```
+**Option B: Disable evaluation**
+```python
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=dataset,
+    # No eval_dataset
+    args=SFTConfig(
+        eval_strategy="no",  # ← Explicitly disable
+        ...
+    ),
+)
+```
+**Prevention:**
+- Always create train/eval split for better monitoring
+- Use `dataset.train_test_split(test_size=0.1, seed=42)`
+- Check example scripts: `scripts/train_sft_example.py` includes proper eval setup
+## Job Times Out
+**Problem:** Job terminates before training completes, all progress lost.
+**Solutions:**
+- Increase timeout parameter (e.g., `"timeout": "4h"`)
+- Reduce `num_train_epochs` or use smaller dataset slice
+- Use smaller model or enable LoRA/PEFT to speed up training
+- Add 20-30% buffer to estimated time for loading/saving overhead
+**Prevention:**
+- Always start with a quick demo run to estimate timing
+- Use `scripts/estimate_cost.py` to get time estimates
+- Monitor first runs closely via Trackio or logs
+## Model Not Saved to Hub
+**Problem:** Training completes but model doesn't appear on Hub - all work lost.
+**Check:**
+- [ ] `push_to_hub=True` in training config
+- [ ] `hub_model_id` specified with username (e.g., `"username/model-name"`)
+- [ ] `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job submission
+- [ ] User has write access to target repo
+- [ ] Token has write permissions (check at https://huggingface.co/settings/tokens)
+- [ ] Training script calls `trainer.push_to_hub()` at the end
+**See:** `references/hub_saving.md` for detailed Hub authentication troubleshooting
+## Out of Memory (OOM)
+**Problem:** Job fails with CUDA out of memory error.
+**Solutions (in order of preference):**
+1. **Reduce batch size:** Lower `per_device_train_batch_size` (try 4 → 2 → 1)
+2. **Increase gradient accumulation:** Raise `gradient_accumulation_steps` to maintain effective batch size
+3. **Disable evaluation:** Remove `eval_dataset` and `eval_strategy` (saves ~40% memory, good for demos)
+4. **Enable LoRA/PEFT:** Use `peft_config=LoraConfig(r=8, lora_alpha=16)` to train adapters only (smaller rank = less memory)
+5. **Use larger GPU:** Switch from `t4-small` → `l4x1` → `a10g-large` → `a100-large`
+6. **Enable gradient checkpointing:** Set `gradient_checkpointing=True` in config (slower but saves memory)
+7. **Use smaller model:** Try a smaller variant (e.g., 0.5B instead of 3B)
+**Memory guidelines:**
+- T4 (16GB): <1B models with LoRA
+- A10G (24GB): 1-3B models with LoRA, <1B full fine-tune
+- A100 (40GB/80GB): 7B+ models with LoRA, 3B full fine-tune
+## Parameter Naming Issues
+**Problem:** `TypeError: SFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'`
+**Cause:** TRL config classes use `max_length`, not `max_seq_length`.
+**Solution:**
+```python
+# ✅ CORRECT - TRL uses max_length
+SFTConfig(max_length=512)
+DPOConfig(max_length=512)
+# ❌ WRONG - This will fail
+SFTConfig(max_seq_length=512)
+```
+**Note:** Most TRL configs don't require explicit max_length - the default (1024) works well. Only set if you need a specific value.
+## Dataset Format Error
+**Problem:** Training fails with dataset format errors or missing fields.
+**Solutions:**
+1. **Check format documentation:**
+   ```python
+   hf_doc_fetch("https://huggingface.co/docs/trl/dataset_formats")
+   ```
+2. **Validate dataset before training:**
+   ```bash
+   uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
+     --dataset <dataset-name> --split train
+   ```
+   Or via hf_jobs:
+   ```python
+   hf_jobs("uv", {
+       "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
+       "script_args": ["--dataset", "dataset-name", "--split", "train"]
+   })
+   ```
+3. **Verify field names:**
+   - **SFT:** Needs "messages" field (conversational), OR "text" field, OR "prompt"/"completion"
+   - **DPO:** Needs "chosen" and "rejected" fields
+   - **GRPO:** Needs prompt-only format
+4. **Check dataset split:**
+   - Ensure split exists (e.g., `split="train"`)
+   - Preview dataset: `load_dataset("name", split="train[:5]")`
+## Import/Module Errors
+**Problem:** Job fails with "ModuleNotFoundError" or import errors.
+**Solutions:**
+1. **Add PEP 723 header with dependencies:**
+   ```python
+   # /// script
+   # dependencies = [
+   #     "trl>=0.12.0",
+   #     "peft>=0.7.0",
+   #     "transformers>=4.36.0",
+   # ]
+   # ///
+   ```
+2. **Verify exact format:**
+   - Must have `# ///` delimiters (with space after `#`)
+   - Dependencies must be valid PyPI package names
+   - Check spelling and version constraints
+3. **Test locally first:**
+   ```bash
+   uv run train.py  # Tests if dependencies are correct
+   ```
+## Authentication Errors
+**Problem:** Job fails with authentication or permission errors when pushing to Hub.
+**Solutions:**
+1. **Verify authentication:**
+   ```python
+   mcp__huggingface__hf_whoami()  # Check who's authenticated
+   ```
+2. **Check token permissions:**
+   - Go to https://huggingface.co/settings/tokens
+   - Ensure token has "write" permission
+   - Token must not be "read-only"
+3. **Verify token in job:**
+   ```python
+   "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Must be in job config
+   ```
+4. **Check repo permissions:**
+   - User must have write access to target repo
+   - If org repo, user must be member with write access
+   - Repo must exist or user must have permission to create
+## Job Stuck or Not Starting
+**Problem:** Job shows "pending" or "starting" for extended period.
+**Solutions:**
+- Check Jobs dashboard for status: https://huggingface.co/jobs
+- Verify hardware availability (some GPU types may have queues)
+- Try different hardware flavor if one is heavily utilized
+- Check for account billing issues (Jobs requires paid plan)
+**Typical startup times:**
+- CPU jobs: 10-30 seconds
+- GPU jobs: 30-90 seconds
+- If >3 minutes: likely queued or stuck
+## Training Loss Not Decreasing
+**Problem:** Training runs but loss stays flat or doesn't improve.
+**Solutions:**
+1. **Check learning rate:** May be too low (try 2e-5 to 5e-5) or too high (try 1e-6)
+2. **Verify dataset quality:** Inspect examples to ensure they're reasonable
+3. **Check model size:** Very small models may not have capacity for task
+4. **Increase training steps:** May need more epochs or larger dataset
+5. **Verify dataset format:** Wrong format may cause degraded training
+## Logs Not Appearing
+**Problem:** Cannot see training logs or progress.
+**Solutions:**
+1. **Wait 30-60 seconds:** Initial logs can be delayed
+2. **Check logs via MCP tool:**
+   ```python
+   hf_jobs("logs", {"job_id": "your-job-id"})
+   ```
+3. **Use Trackio for real-time monitoring:** See `references/trackio_guide.md`
+4. **Verify job is actually running:**
+   ```python
+   hf_jobs("inspect", {"job_id": "your-job-id"})
+   ```
+## Checkpoint/Resume Issues
+**Problem:** Cannot resume from checkpoint or checkpoint not saved.
+**Solutions:**
+1. **Enable checkpoint saving:**
+   ```python
+   SFTConfig(
+       save_strategy="steps",
+       save_steps=100,
+       hub_strategy="every_save",  # Push each checkpoint
+   )
+   ```
+2. **Verify checkpoints pushed to Hub:** Check model repo for checkpoint folders
+3. **Resume from checkpoint:**
+   ```python
+   trainer = SFTTrainer(
+       model="username/model-name",  # Can be checkpoint path
+       resume_from_checkpoint="username/model-name/checkpoint-1000",
+   )
+   ```
+## Getting Help
+If issues persist:
+1. **Check TRL documentation:**
+   ```python
+   hf_doc_search("your issue", product="trl")
+   ```
+2. **Check Jobs documentation:**
+   ```python
+   hf_doc_fetch("https://huggingface.co/docs/huggingface_hub/guides/jobs")
+   ```
+3. **Review related guides:**
+   - `references/hub_saving.md` - Hub authentication issues
+   - `references/hardware_guide.md` - Hardware selection and specs
+   - `references/training_patterns.md` - Eval dataset requirements
+   - SKILL.md "Working with Scripts" section - Script format and URL issues
+4. **Ask in HF forums:** https://discuss.huggingface.co/

scripts/convert_to_gguf.py ADDED Viewed

	@@ -0,0 +1,350 @@

+#!/usr/bin/env python3
+# /// script
+# dependencies = [
+#     "transformers>=4.36.0",
+#     "peft>=0.7.0",
+#     "torch>=2.0.0",
+#     "accelerate>=0.24.0",
+#     "huggingface_hub>=0.20.0",
+#     "sentencepiece>=0.1.99",
+#     "protobuf>=3.20.0",
+#     "numpy",
+#     "gguf",
+# ]
+# ///
+"""
+GGUF Conversion Script - Production Ready
+This script converts a LoRA fine-tuned model to GGUF format for use with:
+- llama.cpp
+- Ollama
+- LM Studio
+- Other GGUF-compatible tools
+Usage:
+    Set environment variables:
+    - ADAPTER_MODEL: Your fine-tuned model (e.g., "username/my-finetuned-model")
+    - BASE_MODEL: Base model used for fine-tuning (e.g., "Qwen/Qwen2.5-0.5B")
+    - OUTPUT_REPO: Where to upload GGUF files (e.g., "username/my-model-gguf")
+    - HF_USERNAME: Your Hugging Face username (optional, for README)
+Dependencies: All required packages are declared in PEP 723 header above.
+Build tools (gcc, cmake) are installed automatically by this script.
+"""
+import os
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+from huggingface_hub import HfApi
+import subprocess
+print("🔄 GGUF Conversion Script")
+print("=" * 60)
+# Configuration from environment variables
+ADAPTER_MODEL = os.environ.get("ADAPTER_MODEL", "evalstate/qwen-capybara-medium")
+BASE_MODEL = os.environ.get("BASE_MODEL", "Qwen/Qwen2.5-0.5B")
+OUTPUT_REPO = os.environ.get("OUTPUT_REPO", "evalstate/qwen-capybara-medium-gguf")
+username = os.environ.get("HF_USERNAME", ADAPTER_MODEL.split('/')[0])
+print(f"\n📦 Configuration:")
+print(f"   Base model: {BASE_MODEL}")
+print(f"   Adapter model: {ADAPTER_MODEL}")
+print(f"   Output repo: {OUTPUT_REPO}")
+# Step 1: Load base model and adapter
+print("\n🔧 Step 1: Loading base model and LoRA adapter...")
+print("   (This may take a few minutes)")
+base_model = AutoModelForCausalLM.from_pretrained(
+    BASE_MODEL,
+    dtype=torch.float16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+print("   ✅ Base model loaded")
+# Load and merge adapter
+print("   Loading LoRA adapter...")
+model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
+print("   ✅ Adapter loaded")
+print("   Merging adapter with base model...")
+merged_model = model.merge_and_unload()
+print("   ✅ Models merged!")
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(ADAPTER_MODEL, trust_remote_code=True)
+print("   ✅ Tokenizer loaded")
+# Step 2: Save merged model temporarily
+print("\n💾 Step 2: Saving merged model...")
+merged_dir = "/tmp/merged_model"
+merged_model.save_pretrained(merged_dir, safe_serialization=True)
+tokenizer.save_pretrained(merged_dir)
+print(f"   ✅ Merged model saved to {merged_dir}")
+# Step 3: Install llama.cpp for conversion
+print("\n📥 Step 3: Setting up llama.cpp for GGUF conversion...")
+# CRITICAL: Install build tools FIRST (before cloning llama.cpp)
+print("   Installing build tools...")
+subprocess.run(
+    ["apt-get", "update", "-qq"],
+    check=True,
+    capture_output=True
+)
+subprocess.run(
+    ["apt-get", "install", "-y", "-qq", "build-essential", "cmake"],
+    check=True,
+    capture_output=True
+)
+print("   ✅ Build tools installed")
+print("   Cloning llama.cpp repository...")
+subprocess.run(
+    ["git", "clone", "https://github.com/ggerganov/llama.cpp.git", "/tmp/llama.cpp"],
+    check=True,
+    capture_output=True
+)
+print("   ✅ llama.cpp cloned")
+print("   Installing Python dependencies...")
+subprocess.run(
+    ["pip", "install", "-r", "/tmp/llama.cpp/requirements.txt"],
+    check=True,
+    capture_output=True
+)
+# sentencepiece and protobuf are needed for tokenizer conversion
+subprocess.run(
+    ["pip", "install", "sentencepiece", "protobuf"],
+    check=True,
+    capture_output=True
+)
+print("   ✅ Dependencies installed")
+# Step 4: Convert to GGUF (FP16)
+print("\n🔄 Step 4: Converting to GGUF format (FP16)...")
+gguf_output_dir = "/tmp/gguf_output"
+os.makedirs(gguf_output_dir, exist_ok=True)
+convert_script = "/tmp/llama.cpp/convert_hf_to_gguf.py"
+model_name = ADAPTER_MODEL.split('/')[-1]
+gguf_file = f"{gguf_output_dir}/{model_name}-f16.gguf"
+print(f"   Running: python {convert_script} {merged_dir}")
+try:
+    result = subprocess.run(
+        [
+            "python", convert_script,
+            merged_dir,
+            "--outfile", gguf_file,
+            "--outtype", "f16"
+        ],
+        check=True,
+        capture_output=True,
+        text=True
+    )
+    print(result.stdout)
+    if result.stderr:
+        print("Warnings:", result.stderr)
+except subprocess.CalledProcessError as e:
+    print(f"❌ Conversion failed!")
+    print("STDOUT:", e.stdout)
+    print("STDERR:", e.stderr)
+    raise
+print(f"   ✅ FP16 GGUF created: {gguf_file}")
+# Step 5: Quantize to different formats
+print("\n⚙️  Step 5: Creating quantized versions...")
+# Build quantize tool using CMake (more reliable than make)
+print("   Building quantize tool with CMake...")
+try:
+    # Create build directory
+    os.makedirs("/tmp/llama.cpp/build", exist_ok=True)
+    # Configure with CMake
+    subprocess.run(
+        ["cmake", "-B", "/tmp/llama.cpp/build", "-S", "/tmp/llama.cpp",
+         "-DGGML_CUDA=OFF"],  # Disable CUDA for faster build
+        check=True,
+        capture_output=True,
+        text=True
+    )
+    # Build just the quantize tool
+    subprocess.run(
+        ["cmake", "--build", "/tmp/llama.cpp/build", "--target", "llama-quantize", "-j", "4"],
+        check=True,
+        capture_output=True,
+        text=True
+    )
+    print("   ✅ Quantize tool built")
+except subprocess.CalledProcessError as e:
+    print(f"   ❌ Build failed!")
+    print("STDOUT:", e.stdout)
+    print("STDERR:", e.stderr)
+    raise
+# Use the CMake build output path
+quantize_bin = "/tmp/llama.cpp/build/bin/llama-quantize"
+# Common quantization formats
+quant_formats = [
+    ("Q4_K_M", "4-bit, medium quality (recommended)"),
+    ("Q5_K_M", "5-bit, higher quality"),
+    ("Q8_0", "8-bit, very high quality"),
+]
+quantized_files = []
+for quant_type, description in quant_formats:
+    print(f"   Creating {quant_type} quantization ({description})...")
+    quant_file = f"{gguf_output_dir}/{model_name}-{quant_type.lower()}.gguf"
+    subprocess.run(
+        [quantize_bin, gguf_file, quant_file, quant_type],
+        check=True,
+        capture_output=True
+    )
+    quantized_files.append((quant_file, quant_type))
+    # Get file size
+    size_mb = os.path.getsize(quant_file) / (1024 * 1024)
+    print(f"   ✅ {quant_type}: {size_mb:.1f} MB")
+# Step 6: Upload to Hub
+print("\n☁️  Step 6: Uploading to Hugging Face Hub...")
+api = HfApi()
+# Create repo
+print(f"   Creating repository: {OUTPUT_REPO}")
+try:
+    api.create_repo(repo_id=OUTPUT_REPO, repo_type="model", exist_ok=True)
+    print("   ✅ Repository created")
+except Exception as e:
+    print(f"   ℹ️  Repository may already exist: {e}")
+# Upload FP16 version
+print("   Uploading FP16 GGUF...")
+api.upload_file(
+    path_or_fileobj=gguf_file,
+    path_in_repo=f"{model_name}-f16.gguf",
+    repo_id=OUTPUT_REPO,
+)
+print("   ✅ FP16 uploaded")
+# Upload quantized versions
+for quant_file, quant_type in quantized_files:
+    print(f"   Uploading {quant_type}...")
+    api.upload_file(
+        path_or_fileobj=quant_file,
+        path_in_repo=f"{model_name}-{quant_type.lower()}.gguf",
+        repo_id=OUTPUT_REPO,
+    )
+    print(f"   ✅ {quant_type} uploaded")
+# Create README
+print("\n📝 Creating README...")
+readme_content = f"""---
+base_model: {BASE_MODEL}
+tags:
+- gguf
+- llama.cpp
+- quantized
+- trl
+- sft
+---
+# {OUTPUT_REPO.split('/')[-1]}
+This is a GGUF conversion of [{ADAPTER_MODEL}](https://huggingface.co/{ADAPTER_MODEL}), which is a LoRA fine-tuned version of [{BASE_MODEL}](https://huggingface.co/{BASE_MODEL}).
+## Model Details
+- **Base Model:** {BASE_MODEL}
+- **Fine-tuned Model:** {ADAPTER_MODEL}
+- **Training:** Supervised Fine-Tuning (SFT) with TRL
+- **Format:** GGUF (for llama.cpp, Ollama, LM Studio, etc.)
+## Available Quantizations
+| File | Quant | Size | Description | Use Case |
+|------|-------|------|-------------|----------|
+| {model_name}-f16.gguf | F16 | ~1GB | Full precision | Best quality, slower |
+| {model_name}-q8_0.gguf | Q8_0 | ~500MB | 8-bit | High quality |
+| {model_name}-q5_k_m.gguf | Q5_K_M | ~350MB | 5-bit medium | Good quality, smaller |
+| {model_name}-q4_k_m.gguf | Q4_K_M | ~300MB | 4-bit medium | Recommended - good balance |
+## Usage
+### With llama.cpp
+```bash
+# Download model
+huggingface-cli download {OUTPUT_REPO} {model_name}-q4_k_m.gguf
+# Run with llama.cpp
+./llama-cli -m {model_name}-q4_k_m.gguf -p "Your prompt here"
+```
+### With Ollama
+1. Create a `Modelfile`:
+```
+FROM ./{model_name}-q4_k_m.gguf
+```
+2. Create the model:
+```bash
+ollama create my-model -f Modelfile
+ollama run my-model
+```
+### With LM Studio
+1. Download the `.gguf` file
+2. Import into LM Studio
+3. Start chatting!
+## License
+Inherits the license from the base model: {BASE_MODEL}
+## Citation
+```bibtex
+@misc{{{OUTPUT_REPO.split('/')[-1].replace('-', '_')},
+  author = {{{username}}},
+  title = {{{OUTPUT_REPO.split('/')[-1]}}},
+  year = {{2025}},
+  publisher = {{Hugging Face}},
+  url = {{https://huggingface.co/{OUTPUT_REPO}}}
+}}
+```
+---
+*Converted to GGUF format using llama.cpp*
+"""
+api.upload_file(
+    path_or_fileobj=readme_content.encode(),
+    path_in_repo="README.md",
+    repo_id=OUTPUT_REPO,
+)
+print("   ✅ README uploaded")
+print("\n" + "=" * 60)
+print("✅ GGUF Conversion Complete!")
+print(f"📦 Repository: https://huggingface.co/{OUTPUT_REPO}")
+print(f"\n📥 Download with:")
+print(f"   huggingface-cli download {OUTPUT_REPO} {model_name}-q4_k_m.gguf")
+print(f"\n🚀 Use with Ollama:")
+print("   1. Download the GGUF file")
+print(f"   2. Create Modelfile: FROM ./{model_name}-q4_k_m.gguf")
+print("   3. ollama create my-model -f Modelfile")
+print("   4. ollama run my-model")
+print("=" * 60)

scripts/dataset_inspector.py ADDED Viewed

	@@ -0,0 +1,416 @@

+#!/usr/bin/env python3
+# /// script
+# dependencies = []
+# ///
+"""
+Dataset Format Inspector for TRL Training (LLM-Optimized Output)
+Inspects Hugging Face datasets to determine TRL training compatibility.
+Uses Datasets Server API for instant results - no dataset download needed!
+ULTRA-EFFICIENT: Uses HF Datasets Server API - completes in <2 seconds.
+Usage with HF Jobs:
+    hf_jobs("uv", {
+        "script": "https://huggingface.co/datasets/evalstate/trl-helpers/raw/main/dataset_inspector.py",
+        "script_args": ["--dataset", "your/dataset", "--split", "train"]
+    })
+"""
+import argparse
+import sys
+import json
+import urllib.request
+import urllib.parse
+from typing import List, Dict, Any
+def parse_args():
+    parser = argparse.ArgumentParser(description="Inspect dataset format for TRL training")
+    parser.add_argument("--dataset", type=str, required=True, help="Dataset name")
+    parser.add_argument("--split", type=str, default="train", help="Dataset split (default: train)")
+    parser.add_argument("--config", type=str, default="default", help="Dataset config name (default: default)")
+    parser.add_argument("--preview", type=int, default=150, help="Max chars per field preview")
+    parser.add_argument("--samples", type=int, default=5, help="Number of samples to fetch (default: 5)")
+    parser.add_argument("--json-output", action="store_true", help="Output as JSON")
+    return parser.parse_args()
+def api_request(url: str) -> Dict:
+    """Make API request to Datasets Server"""
+    try:
+        with urllib.request.urlopen(url, timeout=10) as response:
+            return json.loads(response.read().decode())
+    except urllib.error.HTTPError as e:
+        if e.code == 404:
+            return None
+        raise Exception(f"API request failed: {e.code} {e.reason}")
+    except Exception as e:
+        raise Exception(f"API request failed: {str(e)}")
+def get_splits(dataset: str) -> Dict:
+    """Get available splits for dataset"""
+    url = f"https://datasets-server.huggingface.co/splits?dataset={urllib.parse.quote(dataset)}"
+    return api_request(url)
+def get_rows(dataset: str, config: str, split: str, offset: int = 0, length: int = 5) -> Dict:
+    """Get rows from dataset"""
+    url = f"https://datasets-server.huggingface.co/rows?dataset={urllib.parse.quote(dataset)}&config={config}&split={split}&offset={offset}&length={length}"
+    return api_request(url)
+def find_columns(columns: List[str], patterns: List[str]) -> List[str]:
+    """Find columns matching patterns"""
+    return [c for c in columns if any(p in c.lower() for p in patterns)]
+def check_sft_compatibility(columns: List[str]) -> Dict[str, Any]:
+    """Check SFT compatibility"""
+    has_messages = "messages" in columns
+    has_text = "text" in columns
+    has_prompt_completion = "prompt" in columns and "completion" in columns
+    ready = has_messages or has_text or has_prompt_completion
+    possible_prompt = find_columns(columns, ["prompt", "instruction", "question", "input"])
+    possible_response = find_columns(columns, ["response", "completion", "output", "answer"])
+    return {
+        "ready": ready,
+        "reason": "messages" if has_messages else "text" if has_text else "prompt+completion" if has_prompt_completion else None,
+        "possible_prompt": possible_prompt[0] if possible_prompt else None,
+        "possible_response": possible_response[0] if possible_response else None,
+        "has_context": "context" in columns,
+    }
+def check_dpo_compatibility(columns: List[str]) -> Dict[str, Any]:
+    """Check DPO compatibility"""
+    has_standard = "prompt" in columns and "chosen" in columns and "rejected" in columns
+    possible_prompt = find_columns(columns, ["prompt", "instruction", "question", "input"])
+    possible_chosen = find_columns(columns, ["chosen", "preferred", "winner"])
+    possible_rejected = find_columns(columns, ["rejected", "dispreferred", "loser"])
+    can_map = bool(possible_prompt and possible_chosen and possible_rejected)
+    return {
+        "ready": has_standard,
+        "can_map": can_map,
+        "prompt_col": possible_prompt[0] if possible_prompt else None,
+        "chosen_col": possible_chosen[0] if possible_chosen else None,
+        "rejected_col": possible_rejected[0] if possible_rejected else None,
+    }
+def check_grpo_compatibility(columns: List[str]) -> Dict[str, Any]:
+    """Check GRPO compatibility"""
+    has_prompt = "prompt" in columns
+    has_no_responses = "chosen" not in columns and "rejected" not in columns
+    possible_prompt = find_columns(columns, ["prompt", "instruction", "question", "input"])
+    return {
+        "ready": has_prompt and has_no_responses,
+        "can_map": bool(possible_prompt) and has_no_responses,
+        "prompt_col": possible_prompt[0] if possible_prompt else None,
+    }
+def check_kto_compatibility(columns: List[str]) -> Dict[str, Any]:
+    """Check KTO compatibility"""
+    return {"ready": "prompt" in columns and "completion" in columns and "label" in columns}
+def generate_mapping_code(method: str, info: Dict[str, Any]) -> str:
+    """Generate mapping code for a training method"""
+    if method == "SFT":
+        if info["ready"]:
+            return None
+        prompt_col = info.get("possible_prompt")
+        response_col = info.get("possible_response")
+        has_context = info.get("has_context", False)
+        if not prompt_col:
+            return None
+        if has_context and response_col:
+            return f"""def format_for_sft(example):
+    text = f"Instruction: {{example['{prompt_col}']}}\\n\\n"
+    if example.get('context'):
+        text += f"Context: {{example['context']}}\\n\\n"
+    text += f"Response: {{example['{response_col}']}}"
+    return {{'text': text}}
+dataset = dataset.map(format_for_sft, remove_columns=dataset.column_names)"""
+        elif response_col:
+            return f"""def format_for_sft(example):
+    return {{'text': f"{{example['{prompt_col}']}}\\n\\n{{example['{response_col}']}}}}
+dataset = dataset.map(format_for_sft, remove_columns=dataset.column_names)"""
+        else:
+            return f"""def format_for_sft(example):
+    return {{'text': example['{prompt_col}']}}
+dataset = dataset.map(format_for_sft, remove_columns=dataset.column_names)"""
+    elif method == "DPO":
+        if info["ready"] or not info["can_map"]:
+            return None
+        return f"""def format_for_dpo(example):
+    return {{
+        'prompt': example['{info['prompt_col']}'],
+        'chosen': example['{info['chosen_col']}'],
+        'rejected': example['{info['rejected_col']}'],
+    }}
+dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)"""
+    elif method == "GRPO":
+        if info["ready"] or not info["can_map"]:
+            return None
+        return f"""def format_for_grpo(example):
+    return {{'prompt': example['{info['prompt_col']}']}}
+dataset = dataset.map(format_for_grpo, remove_columns=dataset.column_names)"""
+    return None
+def format_value_preview(value: Any, max_chars: int) -> str:
+    """Format value for preview"""
+    if value is None:
+        return "None"
+    elif isinstance(value, str):
+        return value[:max_chars] + ("..." if len(value) > max_chars else "")
+    elif isinstance(value, list):
+        if len(value) > 0 and isinstance(value[0], dict):
+            return f"[{len(value)} items] Keys: {list(value[0].keys())}"
+        preview = str(value)
+        return preview[:max_chars] + ("..." if len(preview) > max_chars else "")
+    else:
+        preview = str(value)
+        return preview[:max_chars] + ("..." if len(preview) > max_chars else "")
+def main():
+    args = parse_args()
+    print(f"Fetching dataset info via Datasets Server API...")
+    try:
+        # Get splits info
+        splits_data = get_splits(args.dataset)
+        if not splits_data or "splits" not in splits_data:
+            print(f"ERROR: Could not fetch splits for dataset '{args.dataset}'")
+            print(f"       Dataset may not exist or is not accessible via Datasets Server API")
+            sys.exit(1)
+        # Find the right config
+        available_configs = set()
+        split_found = False
+        config_to_use = args.config
+        for split_info in splits_data["splits"]:
+            available_configs.add(split_info["config"])
+            if split_info["config"] == args.config and split_info["split"] == args.split:
+                split_found = True
+        # If default config not found, try first available
+        if not split_found and available_configs:
+            config_to_use = list(available_configs)[0]
+            print(f"Config '{args.config}' not found, trying '{config_to_use}'...")
+        # Get rows
+        rows_data = get_rows(args.dataset, config_to_use, args.split, offset=0, length=args.samples)
+        if not rows_data or "rows" not in rows_data:
+            print(f"ERROR: Could not fetch rows for dataset '{args.dataset}'")
+            print(f"       Split '{args.split}' may not exist")
+            print(f"       Available configs: {', '.join(sorted(available_configs))}")
+            sys.exit(1)
+        rows = rows_data["rows"]
+        if not rows:
+            print(f"ERROR: No rows found in split '{args.split}'")
+            sys.exit(1)
+        # Extract column info from first row
+        first_row = rows[0]["row"]
+        columns = list(first_row.keys())
+        features = rows_data.get("features", [])
+        # Get total count if available
+        total_examples = "Unknown"
+        for split_info in splits_data["splits"]:
+            if split_info["config"] == config_to_use and split_info["split"] == args.split:
+                total_examples = f"{split_info.get('num_examples', 'Unknown'):,}" if isinstance(split_info.get('num_examples'), int) else "Unknown"
+                break
+    except Exception as e:
+        print(f"ERROR: {str(e)}")
+        sys.exit(1)
+    # Run compatibility checks
+    sft_info = check_sft_compatibility(columns)
+    dpo_info = check_dpo_compatibility(columns)
+    grpo_info = check_grpo_compatibility(columns)
+    kto_info = check_kto_compatibility(columns)
+    # Determine recommended methods
+    recommended = []
+    if sft_info["ready"]:
+        recommended.append("SFT")
+    elif sft_info["possible_prompt"]:
+        recommended.append("SFT (needs mapping)")
+    if dpo_info["ready"]:
+        recommended.append("DPO")
+    elif dpo_info["can_map"]:
+        recommended.append("DPO (needs mapping)")
+    if grpo_info["ready"]:
+        recommended.append("GRPO")
+    elif grpo_info["can_map"]:
+        recommended.append("GRPO (needs mapping)")
+    if kto_info["ready"]:
+        recommended.append("KTO")
+    # JSON output mode
+    if args.json_output:
+        result = {
+            "dataset": args.dataset,
+            "config": config_to_use,
+            "split": args.split,
+            "total_examples": total_examples,
+            "columns": columns,
+            "features": [{"name": f["name"], "type": f["type"]} for f in features] if features else [],
+            "compatibility": {
+                "SFT": sft_info,
+                "DPO": dpo_info,
+                "GRPO": grpo_info,
+                "KTO": kto_info,
+            },
+            "recommended_methods": recommended,
+        }
+        print(json.dumps(result, indent=2))
+        sys.exit(0)
+    # Human-readable output optimized for LLM parsing
+    print("=" * 80)
+    print(f"DATASET INSPECTION RESULTS")
+    print("=" * 80)
+    print(f"\nDataset: {args.dataset}")
+    print(f"Config: {config_to_use}")
+    print(f"Split: {args.split}")
+    print(f"Total examples: {total_examples}")
+    print(f"Samples fetched: {len(rows)}")
+    print(f"\n{'COLUMNS':-<80}")
+    if features:
+        for feature in features:
+            print(f"  {feature['name']}: {feature['type']}")
+    else:
+        for col in columns:
+            print(f"  {col}: (type info not available)")
+    print(f"\n{'EXAMPLE DATA':-<80}")
+    example = first_row
+    for col in columns:
+        value = example.get(col)
+        display = format_value_preview(value, args.preview)
+        print(f"\n{col}:")
+        print(f"  {display}")
+    print(f"\n{'TRAINING METHOD COMPATIBILITY':-<80}")
+    # SFT
+    print(f"\n[SFT] {'✓ READY' if sft_info['ready'] else '✗ NEEDS MAPPING'}")
+    if sft_info["ready"]:
+        print(f"  Reason: Dataset has '{sft_info['reason']}' field")
+        print(f"  Action: Use directly with SFTTrainer")
+    elif sft_info["possible_prompt"]:
+        print(f"  Detected: prompt='{sft_info['possible_prompt']}' response='{sft_info['possible_response']}'")
+        print(f"  Action: Apply mapping code (see below)")
+    else:
+        print(f"  Status: Cannot determine mapping - manual inspection needed")
+    # DPO
+    print(f"\n[DPO] {'✓ READY' if dpo_info['ready'] else '✗ NEEDS MAPPING' if dpo_info['can_map'] else '✗ INCOMPATIBLE'}")
+    if dpo_info["ready"]:
+        print(f"  Reason: Dataset has 'prompt', 'chosen', 'rejected' fields")
+        print(f"  Action: Use directly with DPOTrainer")
+    elif dpo_info["can_map"]:
+        print(f"  Detected: prompt='{dpo_info['prompt_col']}' chosen='{dpo_info['chosen_col']}' rejected='{dpo_info['rejected_col']}'")
+        print(f"  Action: Apply mapping code (see below)")
+    else:
+        print(f"  Status: Missing required fields (prompt + chosen + rejected)")
+    # GRPO
+    print(f"\n[GRPO] {'✓ READY' if grpo_info['ready'] else '✗ NEEDS MAPPING' if grpo_info['can_map'] else '✗ INCOMPATIBLE'}")
+    if grpo_info["ready"]:
+        print(f"  Reason: Dataset has 'prompt' field")
+        print(f"  Action: Use directly with GRPOTrainer")
+    elif grpo_info["can_map"]:
+        print(f"  Detected: prompt='{grpo_info['prompt_col']}'")
+        print(f"  Action: Apply mapping code (see below)")
+    else:
+        print(f"  Status: Missing prompt field")
+    # KTO
+    print(f"\n[KTO] {'✓ READY' if kto_info['ready'] else '✗ INCOMPATIBLE'}")
+    if kto_info["ready"]:
+        print(f"  Reason: Dataset has 'prompt', 'completion', 'label' fields")
+        print(f"  Action: Use directly with KTOTrainer")
+    else:
+        print(f"  Status: Missing required fields (prompt + completion + label)")
+    # Mapping code
+    print(f"\n{'MAPPING CODE (if needed)':-<80}")
+    mapping_needed = False
+    sft_mapping = generate_mapping_code("SFT", sft_info)
+    if sft_mapping:
+        print(f"\n# For SFT Training:")
+        print(sft_mapping)
+        mapping_needed = True
+    dpo_mapping = generate_mapping_code("DPO", dpo_info)
+    if dpo_mapping:
+        print(f"\n# For DPO Training:")
+        print(dpo_mapping)
+        mapping_needed = True
+    grpo_mapping = generate_mapping_code("GRPO", grpo_info)
+    if grpo_mapping:
+        print(f"\n# For GRPO Training:")
+        print(grpo_mapping)
+        mapping_needed = True
+    if not mapping_needed:
+        print("\nNo mapping needed - dataset is ready for training!")
+    print(f"\n{'SUMMARY':-<80}")
+    print(f"Recommended training methods: {', '.join(recommended) if recommended else 'None (dataset needs formatting)'}")
+    print(f"\nNote: Used Datasets Server API (instant, no download required)")
+    print("\n" + "=" * 80)
+    sys.exit(0)
+if __name__ == "__main__":
+    try:
+        main()
+    except KeyboardInterrupt:
+        sys.exit(0)
+    except Exception as e:
+        print(f"ERROR: {e}", file=sys.stderr)
+        sys.exit(1)

scripts/estimate_cost.py ADDED Viewed

	@@ -0,0 +1,149 @@

+#!/usr/bin/env python3
+# /// script
+# dependencies = []
+# ///
+"""
+Estimate training time and cost for TRL jobs.
+Usage:
+    python estimate_cost.py --model <model> --dataset <dataset> --hardware <flavor>
+Example:
+    python estimate_cost.py --model Qwen/Qwen2.5-0.5B --dataset trl-lib/Capybara --hardware a10g-large
+"""
+import argparse
+# Hardware costs per hour (approximate)
+HARDWARE_COSTS = {
+    "t4-small": 0.75,
+    "t4-medium": 1.50,
+    "l4x1": 2.50,
+    "a10g-small": 3.50,
+    "a10g-large": 5.00,
+    "a10g-largex2": 10.00,
+    "a10g-largex4": 20.00,
+    "a100-large": 10.00,
+}
+# Model sizes in billions of parameters
+MODEL_SIZES = {
+    "0.5B": 0.5,
+    "1.5B": 1.5,
+    "3B": 3,
+    "7B": 7,
+    "13B": 13,
+}
+def estimate_training_time(model_params, dataset_size, epochs, hardware):
+    """Estimate training time in hours."""
+    # Rough estimates based on empirical observations
+    # These are approximations and actual times will vary
+    base_time_per_1k_examples = 0.1  # hours for 1B model on a10g-large
+    # Adjust for model size
+    time = base_time_per_1k_examples * model_params * (dataset_size / 1000) * epochs
+    # Adjust for hardware (relative to a10g-large baseline)
+    hardware_multipliers = {
+        "t4-small": 2.0,
+        "t4-medium": 1.5,
+        "l4x1": 1.2,
+        "a10g-small": 1.3,
+        "a10g-large": 1.0,
+        "a10g-largex2": 0.6,
+        "a10g-largex4": 0.4,
+        "a100-large": 0.7,
+    }
+    multiplier = hardware_multipliers.get(hardware, 1.0)
+    time *= multiplier
+    return time
+def parse_args():
+    parser = argparse.ArgumentParser(description="Estimate training cost for TRL jobs")
+    parser.add_argument("--model", required=True, help="Model name or size (e.g., 'Qwen/Qwen2.5-0.5B' or '0.5B')")
+    parser.add_argument("--dataset", required=True, help="Dataset name")
+    parser.add_argument("--hardware", required=True, choices=HARDWARE_COSTS.keys(), help="Hardware flavor")
+    parser.add_argument("--dataset-size", type=int, help="Override dataset size (number of examples)")
+    parser.add_argument("--epochs", type=int, default=3, help="Number of training epochs")
+    return parser.parse_args()
+def extract_model_size(model_name):
+    """Extract model size from name or return parsed value."""
+    for size_str, size_val in MODEL_SIZES.items():
+        if size_str in model_name:
+            return size_val
+    # Try to parse directly
+    try:
+        if "B" in model_name:
+            return float(model_name.replace("B", ""))
+    except:
+        pass
+    return 1.0  # Default to 1B if can't determine
+def main():
+    args = parse_args()
+    # Extract model parameters
+    model_params = extract_model_size(args.model)
+    print(f"📊 Model: {args.model} (~{model_params}B parameters)")
+    # Estimate dataset size (would need to load to get real size)
+    if args.dataset_size:
+        dataset_size = args.dataset_size
+    else:
+        # Common dataset sizes (approximations)
+        dataset_sizes = {
+            "trl-lib/Capybara": 16000,
+            "Anthropic/hh-rlhf": 160000,
+        }
+        dataset_size = dataset_sizes.get(args.dataset, 10000)
+    print(f"📦 Dataset: {args.dataset} (~{dataset_size} examples)")
+    print(f"🔄 Epochs: {args.epochs}")
+    print(f"💻 Hardware: {args.hardware}")
+    print()
+    # Estimate training time
+    estimated_hours = estimate_training_time(model_params, dataset_size, args.epochs, args.hardware)
+    estimated_cost = estimated_hours * HARDWARE_COSTS[args.hardware]
+    # Recommend timeout with buffer
+    recommended_timeout_hours = estimated_hours * 1.3  # 30% buffer
+    print(f"⏱️  Estimated training time: {estimated_hours:.1f} hours")
+    print(f"💰 Estimated cost: ${estimated_cost:.2f}")
+    print(f"⏰ Recommended timeout: {recommended_timeout_hours:.1f}h (with 30% buffer)")
+    print()
+    # Warnings and recommendations
+    if estimated_hours > 4:
+        print("⚠️  Long training time - consider:")
+        print("   - Using faster hardware")
+        print("   - Reducing epochs")
+        print("   - Using a smaller dataset subset for testing")
+    if model_params >= 7 and args.hardware not in ["a10g-largex2", "a10g-largex4", "a100-large"]:
+        print("⚠️  Large model - consider using:")
+        print("   - Larger GPU (a100-large)")
+        print("   - Multi-GPU setup (a10g-largex2 or a10g-largex4)")
+        print("   - LoRA/PEFT for memory efficiency")
+    print()
+    print("📋 Example job configuration:")
+    print(f"""
+hf_jobs("uv", {{
+    "script": "your_training_script.py",
+    "flavor": "{args.hardware}",
+    "timeout": "{recommended_timeout_hours:.0f}h",
+    "secrets": {{"HF_TOKEN": "$HF_TOKEN"}}
+}})
+""")
+if __name__ == "__main__":
+    main()

scripts/train_dpo_example.py ADDED Viewed

	@@ -0,0 +1,105 @@

+#!/usr/bin/env python3
+# /// script
+# dependencies = [
+#     "trl>=0.12.0",
+#     "transformers>=4.36.0",
+#     "accelerate>=0.24.0",
+#     "trackio",
+# ]
+# ///
+"""
+Production-ready DPO training example for preference learning.
+DPO (Direct Preference Optimization) trains models on preference pairs
+(chosen vs rejected responses) without requiring a reward model.
+Usage with hf_jobs MCP tool:
+    hf_jobs("uv", {
+        "script": '''<paste this entire file>''',
+        "flavor": "a10g-large",
+        "timeout": "3h",
+        "secrets": {"HF_TOKEN": "$HF_TOKEN"},
+    })
+Or submit the script content directly inline without saving to a file.
+"""
+import trackio
+from datasets import load_dataset
+from trl import DPOTrainer, DPOConfig
+# Load preference dataset
+print("📦 Loading dataset...")
+dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
+print(f"✅ Dataset loaded: {len(dataset)} preference pairs")
+# Create train/eval split
+print("🔀 Creating train/eval split...")
+dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
+train_dataset = dataset_split["train"]
+eval_dataset = dataset_split["test"]
+print(f"   Train: {len(train_dataset)} pairs")
+print(f"   Eval: {len(eval_dataset)} pairs")
+# Training configuration
+config = DPOConfig(
+    # CRITICAL: Hub settings
+    output_dir="qwen-dpo-aligned",
+    push_to_hub=True,
+    hub_model_id="username/qwen-dpo-aligned",
+    hub_strategy="every_save",
+    # DPO-specific parameters
+    beta=0.1,  # KL penalty coefficient (higher = stay closer to reference)
+    # Training parameters
+    num_train_epochs=1,  # DPO typically needs fewer epochs than SFT
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=4,
+    learning_rate=5e-7,  # DPO uses much lower LR than SFT
+    # max_length=1024,  # Default - only set if you need different sequence length
+    # Logging & checkpointing
+    logging_steps=10,
+    save_strategy="steps",
+    save_steps=100,
+    save_total_limit=2,
+    # Evaluation - IMPORTANT: Only enable if eval_dataset provided
+    eval_strategy="steps",
+    eval_steps=100,
+    # Optimization
+    warmup_ratio=0.1,
+    lr_scheduler_type="cosine",
+    # Monitoring
+    report_to="trackio",  # Integrate with Trackio
+    project="meaningful_project_name", # project name for the training name (trackio)
+    run_name="baseline-run", #Descriptive name for this training run
+)
+# Initialize and train
+# Note: DPO requires an instruct-tuned model as the base
+print("🎯 Initializing trainer...")
+trainer = DPOTrainer(
+    model="Qwen/Qwen2.5-0.5B-Instruct",  # Use instruct model, not base model
+    train_dataset=train_dataset,
+    eval_dataset=eval_dataset,  # CRITICAL: Must provide eval_dataset when eval_strategy is enabled
+    args=config,
+)
+print("🚀 Starting DPO training...")
+trainer.train()
+print("💾 Pushing to Hub...")
+trainer.push_to_hub()
+# Finish Trackio tracking
+trackio.finish()
+print("✅ Complete! Model at: https://huggingface.co/username/qwen-dpo-aligned")
+print("📊 View metrics at: https://huggingface.co/spaces/username/trackio")

scripts/train_grpo_example.py ADDED Viewed

	@@ -0,0 +1,88 @@

+#!/usr/bin/env python3
+# /// script
+# dependencies = [
+#     "trl>=0.12.0",
+#     "transformers>=4.36.0",
+#     "accelerate>=0.24.0",
+#     "trackio",
+# ]
+# ///
+"""
+Production-ready GRPO training example for online RL.
+GRPO (Group Relative Policy Optimization) is an online RL method that
+optimizes relative to group performance. Best for tasks with automatic
+reward signals like code execution or math verification.
+Usage with hf_jobs MCP tool:
+    hf_jobs("uv", {
+        "script": '''<paste this entire file>''',
+        "flavor": "a10g-large",
+        "timeout": "4h",
+        "secrets": {"HF_TOKEN": "$HF_TOKEN"},
+    })
+Or submit the script content directly inline without saving to a file.
+Note: For most GRPO use cases, the TRL maintained script is recommended:
+    https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py
+"""
+import trackio
+from datasets import load_dataset
+from trl import GRPOTrainer, GRPOConfig
+# Load dataset (GRPO uses prompt-only format)
+dataset = load_dataset("trl-lib/math_shepherd", split="train")
+print(f"✅ Dataset loaded: {len(dataset)} prompts")
+# Training configuration
+config = GRPOConfig(
+    # CRITICAL: Hub settings
+    output_dir="qwen-grpo-math",
+    push_to_hub=True,
+    hub_model_id="username/qwen-grpo-math",
+    hub_strategy="every_save",
+    # Training parameters
+    num_train_epochs=1,
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=4,
+    learning_rate=1e-6,
+    # Logging & checkpointing
+    logging_steps=10,
+    save_strategy="steps",
+    save_steps=100,
+    save_total_limit=2,
+    # Optimization
+    warmup_ratio=0.1,
+    lr_scheduler_type="cosine",
+    # Monitoring
+    report_to="trackio",  # Integrate with Trackio
+    project="meaningful_project_name", # project name for the training name (trackio)
+    run_name="baseline-run", #Descriptive name for this training run
+)
+# Initialize and train
+# Note: GRPO requires an instruct-tuned model as the base
+trainer = GRPOTrainer(
+    model="Qwen/Qwen2.5-0.5B-Instruct",
+    train_dataset=dataset,
+    args=config,
+)
+print("🚀 Starting GRPO training...")
+trainer.train()
+print("💾 Pushing to Hub...")
+trainer.push_to_hub()
+print("✅ Complete! Model at: https://huggingface.co/username/qwen-grpo-math")
+print("📊 View metrics at: https://huggingface.co/spaces/username/trackio")

scripts/train_sft_example.py ADDED Viewed

	@@ -0,0 +1,119 @@

+#!/usr/bin/env python3
+# /// script
+# dependencies = [
+#     "trl>=0.12.0",
+#     "peft>=0.7.0",
+#     "transformers>=4.36.0",
+#     "accelerate>=0.24.0",
+#     "trackio",  # For real-time monitoring
+# ]
+# ///
+"""
+Production-ready SFT training example with all best practices.
+This script demonstrates:
+- Trackio integration for real-time monitoring
+- LoRA/PEFT for efficient training
+- Proper Hub saving configuration
+- Train/eval split for monitoring
+- Checkpoint management
+- Optimized training parameters
+Usage with hf_jobs MCP tool:
+    hf_jobs("uv", {
+        "script": '''<paste this entire file>''',
+        "flavor": "a10g-large",
+        "timeout": "3h",
+        "secrets": {"HF_TOKEN": "$HF_TOKEN"},
+    })
+Or submit the script content directly inline without saving to a file.
+"""
+import trackio
+from datasets import load_dataset
+from peft import LoraConfig
+from trl import SFTTrainer, SFTConfig
+# Load dataset
+print("📦 Loading dataset...")
+dataset = load_dataset("trl-lib/Capybara", split="train")
+print(f"✅ Dataset loaded: {len(dataset)} examples")
+# Create train/eval split
+print("🔀 Creating train/eval split...")
+dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
+train_dataset = dataset_split["train"]
+eval_dataset = dataset_split["test"]
+print(f"   Train: {len(train_dataset)} examples")
+print(f"   Eval: {len(eval_dataset)} examples")
+# Note: For memory-constrained demos, skip eval by using full dataset as train_dataset
+# and removing eval_dataset, eval_strategy, and eval_steps from config below
+# Training configuration
+config = SFTConfig(
+    # CRITICAL: Hub settings
+    output_dir="qwen-capybara-sft",
+    push_to_hub=True,
+    hub_model_id="username/qwen-capybara-sft",
+    hub_strategy="every_save",  # Push checkpoints
+    # Training parameters
+    num_train_epochs=3,
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=4,
+    learning_rate=2e-5,
+    # max_length=1024,  # Default - only set if you need different sequence length
+    # Logging & checkpointing
+    logging_steps=10,
+    save_strategy="steps",
+    save_steps=100,
+    save_total_limit=2,
+    # Evaluation - IMPORTANT: Only enable if eval_dataset provided
+    eval_strategy="steps",
+    eval_steps=100,
+    # Optimization
+    warmup_ratio=0.1,
+    lr_scheduler_type="cosine",
+    # Monitoring
+    report_to="trackio",  # Integrate with Trackio
+    project="meaningful_project_name", # project name for the training name (trackio)
+    run_name="baseline-run", #Descriptive name for this training run
+# LoRA configuration
+peft_config = LoraConfig(
+    r=16,
+    lora_alpha=32,
+    lora_dropout=0.05,
+    bias="none",
+    task_type="CAUSAL_LM",
+    target_modules=["q_proj", "v_proj"],
+)
+# Initialize and train
+print("🎯 Initializing trainer...")
+trainer = SFTTrainer(
+    model="Qwen/Qwen2.5-0.5B",
+    train_dataset=train_dataset,
+    eval_dataset=eval_dataset,  # CRITICAL: Must provide eval_dataset when eval_strategy is enabled
+    args=config,
+    peft_config=peft_config,
+)
+print("🚀 Starting training...")
+trainer.train()
+print("💾 Pushing to Hub...")
+trainer.push_to_hub()
+# Finish Trackio tracking
+print("✅ Complete! Model at: https://huggingface.co/username/qwen-capybara-sft")
+print("📊 View metrics at: https://huggingface.co/spaces/username/trackio")