tiny-browser-planner / index.html
Georgefifth's picture
Upload folder using huggingface_hub
7439665 verified
Raw
History Blame Contribute Delete
8.57 kB
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>Tiny Browser Planner β€” Research Story</title>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="container">
<header>
<h1>A 1B model can explain the correct browser action<br>before it can reliably choose it.</h1>
<p class="subtitle">Fine-tuning MiniCPM5-1B with LoRA to study what a small model learns about planning.</p>
<div class="links">
<a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason" class="btn" target="_blank" rel="noopener noreferrer">Model</a>
<a href="https://huggingface.co/datasets/Georgefifth/tiny-browser-planner-reason-dataset" class="btn" target="_blank" rel="noopener noreferrer">Dataset</a>
<a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason/blob/main/demo_colab.ipynb" class="btn colab" target="_blank" rel="noopener noreferrer">Download Notebook</a>
</div>
</header>
<section>
<h2>The Problem</h2>
<p>Large browser agents (7B+) are expensive and slow. Can a <strong>1B parameter model</strong> learn to plan browser actions β€” search, navigate, extract, backtrack β€” or does it just memorize heuristics?</p>
<div class="action-box">
<span class="tag">search</span>
<span class="tag">open_page</span>
<span class="tag">extract</span>
<span class="tag">back</span>
<span class="tag">refine_search</span>
<span class="tag">finish</span>
</div>
</section>
<section>
<h2>Experiment Timeline</h2>
<div class="timeline">
<div class="step">
<div class="step-num">v1</div>
<div class="step-body">
<strong>Linear data</strong> β€” 1142 samples, perfect regularity
<div class="result">Learned patterns, couldn't generalize</div>
</div>
</div>
<div class="step">
<div class="step-num">v2/v3</div>
<div class="step-body">
<strong>More data</strong> β€” 1504 β†’ 1948 samples
<div class="result">Still 72% β€” more data didn't help</div>
</div>
</div>
<div class="step">
<div class="step-num">v4</div>
<div class="step-body">
<strong>Hard samples</strong> β€” 243 targeted examples, 89 decision points
<div class="result winner">200 quality examples &gt; 2000 generic ones</div>
</div>
</div>
<div class="step highlight">
<div class="step-num">Ablation</div>
<div class="step-body">
<strong>The Action Space Paradox</strong>
<p>Adding <code>back</code> gave the model a powerful tool β€” but it over-generalized. Both models (with/without back) learned a single heuristic instead of conditional selection.</p>
<table>
<tr><th>Model</th><th>Wrong page</th><th>Paywall</th><th>Standard</th><th>Refine</th><th>Total</th></tr>
<tr><td>vA (no back)</td><td class="pass">search βœ…</td><td class="pass">search βœ…</td><td class="pass">extract βœ…</td><td class="pass">search βœ…</td><td class="pass">11/11</td></tr>
<tr><td>vB (with back)</td><td class="pass">back βœ…</td><td class="warn">back βœ…</td><td class="fail">back ❌</td><td class="fail">back ❌</td><td class="fail">4/12</td></tr>
</table>
<div class="result insight">More actions β†’ more capability β†’ more confusion. The model trades expressive power for decision quality.</div>
</div>
</div>
<div class="step highlight">
<div class="step-num">Reason-First</div>
<div class="step-body">
<strong>Fixed with 40 samples and 7.8 seconds of training</strong>
<p>Hypothesis: the model <em>does</em> understand state, but action-only format encourages shortcutting. Forced reasoning before action.</p>
<div class="format-box">
<code>Observation β†’ <span class="em">Reason: ...</span> β†’ Action: ...</code>
</div>
<table>
<tr><th>Scenario</th><th>vB (action-only)</th><th>Reason-First</th></tr>
<tr><td>Wrong page</td><td class="pass">back βœ…</td><td class="pass">back βœ…</td></tr>
<tr><td>Paywall</td><td class="warn">1/2</td><td class="pass">back βœ…</td></tr>
<tr><td>Standard page</td><td class="fail">back ❌</td><td class="pass">extract βœ…</td></tr>
<tr><td>Needs refine</td><td class="fail">back ❌</td><td class="pass">refine_search βœ…</td></tr>
<tr><td><strong>Total</strong></td><td class="fail"><strong>4/12</strong></td><td class="pass"><strong>10/12</strong></td></tr>
</table>
<div class="result winner">Reasoning eliminates heuristic shortcutting.</div>
</div>
</div>
</div>
</section>
<section>
<h2>Why It Works</h2>
<div class="example">
<div class="example-header wrong">Action-Only (vB)</div>
<div class="example-body">
<p><strong>Input:</strong> Task: Find Apple stock price β€” Page shows price prominently</p>
<p><strong>Output:</strong> <span class="bad">back</span></p>
<p class="note">Heuristic: "obstacle β†’ back" applied to everything</p>
</div>
</div>
<div class="example">
<div class="example-header correct">Reason-First</div>
<div class="example-body">
<p><strong>Input:</strong> Task: Find Apple stock price β€” Page shows price prominently</p>
<p><strong>Output:</strong> <span class="good">Reason: Target information is present on the page</span></p>
<p><strong>Output:</strong> <span class="good">Action: extract βœ…</span></p>
</div>
</div>
<p class="insight">The model already understood the state. The reasoning head exposes that knowledge.</p>
</section>
<section>
<h3>Key Insights</h3>
<div class="cards">
<div class="card">
<div class="card-icon">πŸ“Š</div>
<h4>Data Quality > Quantity</h4>
<p>200 targeted hard samples outperformed 2000 generic ones.</p>
</div>
<div class="card">
<div class="card-icon">🎯</div>
<h4>Reasoning > Data</h4>
<p>40 reason-first samples outperformed 88 action-only samples by fixing heuristic shortcutting.</p>
</div>
<div class="card">
<div class="card-icon">🧠</div>
<h4>State Understanding</h4>
<p>Correct reasoning in all passing cases confirms the model understands state β€” the bottleneck is action selection, not understanding.</p>
</div>
</div>
</section>
<section>
<h3>Try It Yourself</h3>
<p>1. <a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason/blob/main/demo_colab.ipynb" target="_blank">Open the notebook</a> β†’ click the ⬇ Download button<br>2. Go to <a href="https://colab.research.google.com" target="_blank">colab.research.google.com</a><br>3. File β†’ Upload notebook β†’ select the downloaded file<br>4. Runtime β†’ Run all (free GPU included)</p>
<p>Or run locally:</p>
<pre><code>from unsloth import FastLanguageModel
import torch, re
model, tokenizer = FastLanguageModel.from_pretrained(
"Georgefifth/tiny-browser-planner-reason",
max_seq_length=2048,
load_in_4bit=True,
dtype=torch.bfloat16,
)
FastLanguageModel.for_inference(model)
msgs = [
{"role": "system", "content": "First reason, then output the action."},
{"role": "user", "content": "Task: Find Apple stock price\n\nHistory:\n[search] Search completed.\n[open_page] Price displayed prominently at $198\n\nWhat is the next action?"},
]
prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outs = model.generate(**inputs, max_new_tokens=48)
print(tokenizer.decode(outs[0], skip_special_tokens=True))
# Output:
# Reason: Target information is present on the page
# Action: extract</code></pre>
</section>
<footer>
<p>Built with <a href="https://huggingface.co/openbmb/MiniCPM5-1B">MiniCPM5-1B</a> + <a href="https://github.com/unslothai/unsloth">Unsloth</a> | <a href="https://huggingface.co/spaces/Georgefifth/tiny-browser-planner">Space</a></p>
</footer>
</div>
</body>
</html>