Spaces:
Sleeping
Sleeping
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <meta name="viewport" content="width=device-width" /> | |
| <title>Tiny Browser Planner β Research Story</title> | |
| <link rel="stylesheet" href="style.css" /> | |
| </head> | |
| <body> | |
| <div class="container"> | |
| <header> | |
| <h1>A 1B model can explain the correct browser action<br>before it can reliably choose it.</h1> | |
| <p class="subtitle">Fine-tuning MiniCPM5-1B with LoRA to study what a small model learns about planning.</p> | |
| <div class="links"> | |
| <a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason" class="btn" target="_blank" rel="noopener noreferrer">Model</a> | |
| <a href="https://huggingface.co/datasets/Georgefifth/tiny-browser-planner-reason-dataset" class="btn" target="_blank" rel="noopener noreferrer">Dataset</a> | |
| <a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason/blob/main/demo_colab.ipynb" class="btn colab" target="_blank" rel="noopener noreferrer">Download Notebook</a> | |
| </div> | |
| </header> | |
| <section> | |
| <h2>The Problem</h2> | |
| <p>Large browser agents (7B+) are expensive and slow. Can a <strong>1B parameter model</strong> learn to plan browser actions β search, navigate, extract, backtrack β or does it just memorize heuristics?</p> | |
| <div class="action-box"> | |
| <span class="tag">search</span> | |
| <span class="tag">open_page</span> | |
| <span class="tag">extract</span> | |
| <span class="tag">back</span> | |
| <span class="tag">refine_search</span> | |
| <span class="tag">finish</span> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Experiment Timeline</h2> | |
| <div class="timeline"> | |
| <div class="step"> | |
| <div class="step-num">v1</div> | |
| <div class="step-body"> | |
| <strong>Linear data</strong> β 1142 samples, perfect regularity | |
| <div class="result">Learned patterns, couldn't generalize</div> | |
| </div> | |
| </div> | |
| <div class="step"> | |
| <div class="step-num">v2/v3</div> | |
| <div class="step-body"> | |
| <strong>More data</strong> β 1504 β 1948 samples | |
| <div class="result">Still 72% β more data didn't help</div> | |
| </div> | |
| </div> | |
| <div class="step"> | |
| <div class="step-num">v4</div> | |
| <div class="step-body"> | |
| <strong>Hard samples</strong> β 243 targeted examples, 89 decision points | |
| <div class="result winner">200 quality examples > 2000 generic ones</div> | |
| </div> | |
| </div> | |
| <div class="step highlight"> | |
| <div class="step-num">Ablation</div> | |
| <div class="step-body"> | |
| <strong>The Action Space Paradox</strong> | |
| <p>Adding <code>back</code> gave the model a powerful tool β but it over-generalized. Both models (with/without back) learned a single heuristic instead of conditional selection.</p> | |
| <table> | |
| <tr><th>Model</th><th>Wrong page</th><th>Paywall</th><th>Standard</th><th>Refine</th><th>Total</th></tr> | |
| <tr><td>vA (no back)</td><td class="pass">search β </td><td class="pass">search β </td><td class="pass">extract β </td><td class="pass">search β </td><td class="pass">11/11</td></tr> | |
| <tr><td>vB (with back)</td><td class="pass">back β </td><td class="warn">back β </td><td class="fail">back β</td><td class="fail">back β</td><td class="fail">4/12</td></tr> | |
| </table> | |
| <div class="result insight">More actions β more capability β more confusion. The model trades expressive power for decision quality.</div> | |
| </div> | |
| </div> | |
| <div class="step highlight"> | |
| <div class="step-num">Reason-First</div> | |
| <div class="step-body"> | |
| <strong>Fixed with 40 samples and 7.8 seconds of training</strong> | |
| <p>Hypothesis: the model <em>does</em> understand state, but action-only format encourages shortcutting. Forced reasoning before action.</p> | |
| <div class="format-box"> | |
| <code>Observation β <span class="em">Reason: ...</span> β Action: ...</code> | |
| </div> | |
| <table> | |
| <tr><th>Scenario</th><th>vB (action-only)</th><th>Reason-First</th></tr> | |
| <tr><td>Wrong page</td><td class="pass">back β </td><td class="pass">back β </td></tr> | |
| <tr><td>Paywall</td><td class="warn">1/2</td><td class="pass">back β </td></tr> | |
| <tr><td>Standard page</td><td class="fail">back β</td><td class="pass">extract β </td></tr> | |
| <tr><td>Needs refine</td><td class="fail">back β</td><td class="pass">refine_search β </td></tr> | |
| <tr><td><strong>Total</strong></td><td class="fail"><strong>4/12</strong></td><td class="pass"><strong>10/12</strong></td></tr> | |
| </table> | |
| <div class="result winner">Reasoning eliminates heuristic shortcutting.</div> | |
| </div> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h2>Why It Works</h2> | |
| <div class="example"> | |
| <div class="example-header wrong">Action-Only (vB)</div> | |
| <div class="example-body"> | |
| <p><strong>Input:</strong> Task: Find Apple stock price β Page shows price prominently</p> | |
| <p><strong>Output:</strong> <span class="bad">back</span></p> | |
| <p class="note">Heuristic: "obstacle β back" applied to everything</p> | |
| </div> | |
| </div> | |
| <div class="example"> | |
| <div class="example-header correct">Reason-First</div> | |
| <div class="example-body"> | |
| <p><strong>Input:</strong> Task: Find Apple stock price β Page shows price prominently</p> | |
| <p><strong>Output:</strong> <span class="good">Reason: Target information is present on the page</span></p> | |
| <p><strong>Output:</strong> <span class="good">Action: extract β </span></p> | |
| </div> | |
| </div> | |
| <p class="insight">The model already understood the state. The reasoning head exposes that knowledge.</p> | |
| </section> | |
| <section> | |
| <h3>Key Insights</h3> | |
| <div class="cards"> | |
| <div class="card"> | |
| <div class="card-icon">π</div> | |
| <h4>Data Quality > Quantity</h4> | |
| <p>200 targeted hard samples outperformed 2000 generic ones.</p> | |
| </div> | |
| <div class="card"> | |
| <div class="card-icon">π―</div> | |
| <h4>Reasoning > Data</h4> | |
| <p>40 reason-first samples outperformed 88 action-only samples by fixing heuristic shortcutting.</p> | |
| </div> | |
| <div class="card"> | |
| <div class="card-icon">π§ </div> | |
| <h4>State Understanding</h4> | |
| <p>Correct reasoning in all passing cases confirms the model understands state β the bottleneck is action selection, not understanding.</p> | |
| </div> | |
| </div> | |
| </section> | |
| <section> | |
| <h3>Try It Yourself</h3> | |
| <p>1. <a href="https://huggingface.co/Georgefifth/tiny-browser-planner-reason/blob/main/demo_colab.ipynb" target="_blank">Open the notebook</a> β click the β¬ Download button<br>2. Go to <a href="https://colab.research.google.com" target="_blank">colab.research.google.com</a><br>3. File β Upload notebook β select the downloaded file<br>4. Runtime β Run all (free GPU included)</p> | |
| <p>Or run locally:</p> | |
| <pre><code>from unsloth import FastLanguageModel | |
| import torch, re | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| "Georgefifth/tiny-browser-planner-reason", | |
| max_seq_length=2048, | |
| load_in_4bit=True, | |
| dtype=torch.bfloat16, | |
| ) | |
| FastLanguageModel.for_inference(model) | |
| msgs = [ | |
| {"role": "system", "content": "First reason, then output the action."}, | |
| {"role": "user", "content": "Task: Find Apple stock price\n\nHistory:\n[search] Search completed.\n[open_page] Price displayed prominently at $198\n\nWhat is the next action?"}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") | |
| outs = model.generate(**inputs, max_new_tokens=48) | |
| print(tokenizer.decode(outs[0], skip_special_tokens=True)) | |
| # Output: | |
| # Reason: Target information is present on the page | |
| # Action: extract</code></pre> | |
| </section> | |
| <footer> | |
| <p>Built with <a href="https://huggingface.co/openbmb/MiniCPM5-1B">MiniCPM5-1B</a> + <a href="https://github.com/unslothai/unsloth">Unsloth</a> | <a href="https://huggingface.co/spaces/Georgefifth/tiny-browser-planner">Space</a></p> | |
| </footer> | |
| </div> | |
| </body> | |
| </html> | |