Spaces:

Georgefifth
/

tiny-browser-planner

Sleeping

App Files Files Community

tiny-browser-planner / index.html

Georgefifth

Upload folder using huggingface_hub

7439665 verified 16 days ago

Raw

History Blame Contribute Delete

8.57 kB

	<!doctype html>
	<html lang="en">
	<head>
	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width" />
	<title>Tiny Browser Planner — Research Story</title>
	<link rel="stylesheet" href="style.css" />
	</head>
	<body>
	<div class="container">
	<header>
	<h1>A 1B model can explain the correct browser action<br>before it can reliably choose it.</h1>
	<p class="subtitle">Fine-tuning MiniCPM5-1B with LoRA to study what a small model learns about planning.</p>
	<div class="links">
	<a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason" class="btn" target="_blank" rel="noopener noreferrer">Model</a>
	<a href="https://huggingface.co/datasets/Georgefifth/tiny-browser-planner-reason-dataset" class="btn" target="_blank" rel="noopener noreferrer">Dataset</a>
	<a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason/blob/main/demo_colab.ipynb" class="btn colab" target="_blank" rel="noopener noreferrer">Download Notebook</a>
	</div>
	</header>

	<section>
	<h2>The Problem</h2>
	<p>Large browser agents (7B+) are expensive and slow. Can a <strong>1B parameter model</strong> learn to plan browser actions — search, navigate, extract, backtrack — or does it just memorize heuristics?</p>
	<div class="action-box">
	<span class="tag">search</span>
	<span class="tag">open_page</span>
	<span class="tag">extract</span>
	<span class="tag">back</span>
	<span class="tag">refine_search</span>
	<span class="tag">finish</span>
	</div>
	</section>

	<section>
	<h2>Experiment Timeline</h2>
	<div class="timeline">
	<div class="step">
	<div class="step-num">v1</div>
	<div class="step-body">
	<strong>Linear data</strong> — 1142 samples, perfect regularity
	<div class="result">Learned patterns, couldn't generalize</div>
	</div>
	</div>
	<div class="step">
	<div class="step-num">v2/v3</div>
	<div class="step-body">
	<strong>More data</strong> — 1504 → 1948 samples
	<div class="result">Still 72% — more data didn't help</div>
	</div>
	</div>
	<div class="step">
	<div class="step-num">v4</div>
	<div class="step-body">
	<strong>Hard samples</strong> — 243 targeted examples, 89 decision points
	<div class="result winner">200 quality examples > 2000 generic ones</div>
	</div>
	</div>
	<div class="step highlight">
	<div class="step-num">Ablation</div>
	<div class="step-body">
	<strong>The Action Space Paradox</strong>
	<p>Adding <code>back</code> gave the model a powerful tool — but it over-generalized. Both models (with/without back) learned a single heuristic instead of conditional selection.</p>
	<table>
	<tr><th>Model</th><th>Wrong page</th><th>Paywall</th><th>Standard</th><th>Refine</th><th>Total</th></tr>
	<tr><td>vA (no back)</td><td class="pass">search ✅</td><td class="pass">search ✅</td><td class="pass">extract ✅</td><td class="pass">search ✅</td><td class="pass">11/11</td></tr>
	<tr><td>vB (with back)</td><td class="pass">back ✅</td><td class="warn">back ✅</td><td class="fail">back ❌</td><td class="fail">back ❌</td><td class="fail">4/12</td></tr>
	</table>
	<div class="result insight">More actions → more capability → more confusion. The model trades expressive power for decision quality.</div>
	</div>
	</div>
	<div class="step highlight">
	<div class="step-num">Reason-First</div>
	<div class="step-body">
	<strong>Fixed with 40 samples and 7.8 seconds of training</strong>
	<p>Hypothesis: the model <em>does</em> understand state, but action-only format encourages shortcutting. Forced reasoning before action.</p>
	<div class="format-box">
	<code>Observation → <span class="em">Reason: ...</span> → Action: ...</code>
	</div>
	<table>
	<tr><th>Scenario</th><th>vB (action-only)</th><th>Reason-First</th></tr>
	<tr><td>Wrong page</td><td class="pass">back ✅</td><td class="pass">back ✅</td></tr>
	<tr><td>Paywall</td><td class="warn">1/2</td><td class="pass">back ✅</td></tr>
	<tr><td>Standard page</td><td class="fail">back ❌</td><td class="pass">extract ✅</td></tr>
	<tr><td>Needs refine</td><td class="fail">back ❌</td><td class="pass">refine_search ✅</td></tr>
	<tr><td><strong>Total</strong></td><td class="fail"><strong>4/12</strong></td><td class="pass"><strong>10/12</strong></td></tr>
	</table>
	<div class="result winner">Reasoning eliminates heuristic shortcutting.</div>
	</div>
	</div>
	</div>
	</section>

	<section>
	<h2>Why It Works</h2>
	<div class="example">
	<div class="example-header wrong">Action-Only (vB)</div>
	<div class="example-body">
	<p><strong>Input:</strong> Task: Find Apple stock price — Page shows price prominently</p>
	<p><strong>Output:</strong> <span class="bad">back</span></p>
	<p class="note">Heuristic: "obstacle → back" applied to everything</p>
	</div>
	</div>
	<div class="example">
	<div class="example-header correct">Reason-First</div>
	<div class="example-body">
	<p><strong>Input:</strong> Task: Find Apple stock price — Page shows price prominently</p>
	<p><strong>Output:</strong> <span class="good">Reason: Target information is present on the page</span></p>
	<p><strong>Output:</strong> <span class="good">Action: extract ✅</span></p>
	</div>
	</div>
	<p class="insight">The model already understood the state. The reasoning head exposes that knowledge.</p>
	</section>

	<section>
	<h3>Key Insights</h3>
	<div class="cards">
	<div class="card">
	<div class="card-icon">📊</div>
	<h4>Data Quality > Quantity</h4>
	<p>200 targeted hard samples outperformed 2000 generic ones.</p>
	</div>
	<div class="card">
	<div class="card-icon">🎯</div>
	<h4>Reasoning > Data</h4>
	<p>40 reason-first samples outperformed 88 action-only samples by fixing heuristic shortcutting.</p>
	</div>
	<div class="card">
	<div class="card-icon">🧠</div>
	<h4>State Understanding</h4>
	<p>Correct reasoning in all passing cases confirms the model understands state — the bottleneck is action selection, not understanding.</p>
	</div>
	</div>
	</section>

	<section>
	<h3>Try It Yourself</h3>
	<p>1. <a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason/blob/main/demo_colab.ipynb" target="_blank">Open the notebook</a> → click the ⬇ Download button<br>2. Go to <a href="https://colab.research.google.com" target="_blank">colab.research.google.com</a><br>3. File → Upload notebook → select the downloaded file<br>4. Runtime → Run all (free GPU included)</p>
	<p>Or run locally:</p>
	<pre><code>from unsloth import FastLanguageModel
	import torch, re

	model, tokenizer = FastLanguageModel.from_pretrained(
	"Georgefifth/tiny-browser-planner-reason",
	max_seq_length=2048,
	load_in_4bit=True,
	dtype=torch.bfloat16,
	)
	FastLanguageModel.for_inference(model)

	msgs = [
	{"role": "system", "content": "First reason, then output the action."},
	{"role": "user", "content": "Task: Find Apple stock price\n\nHistory:\n[search] Search completed.\n[open_page] Price displayed prominently at $198\n\nWhat is the next action?"},
	]
	prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outs = model.generate(**inputs, max_new_tokens=48)
	print(tokenizer.decode(outs[0], skip_special_tokens=True))
	# Output:
	# Reason: Target information is present on the page
	# Action: extract</code></pre>
	</section>

	<footer>
	<p>Built with <a href="https://huggingface.co/openbmb/MiniCPM5-1B">MiniCPM5-1B</a> + <a href="https://github.com/unslothai/unsloth">Unsloth</a> \| <a href="https://huggingface.co/spaces/Georgefifth/tiny-browser-planner">Space</a></p>
	</footer>
	</div>
	</body>
	</html>