| title: Tiny Browser Planner | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.36.1 | |
| python_version: 3.12.0 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - track:backyard | |
| - sponsor:openbmb | |
| - achievement:welltuned | |
| - achievement:fieldnotes | |
| # Tiny Browser Planner | |
| A 1B model that plans browser actions β fine-tuned MiniCPM5-1B with LoRA. | |
| **The core finding:** A 1B model can explain the correct browser action before it can reliably choose it. | |
| ## Links | |
| - **Space (Gradio Demo)**: https://huggingface.co/spaces/build-small-hackathon/tiny-browser-planner | |
| - **Model**: https://huggingface.co/Georgefifth/tiny-browser-planner-reason | |
| - **Dataset**: https://huggingface.co/datasets/Georgefifth/tiny-browser-planner-reason-dataset | |
| - **Demo Video (X Post)**: https://x.com/i/status/2066584425140473947 | |
| ## The Core Finding | |
| > A 1B model can explain the correct browser action before it can reliably choose it. | |
| ## Experiments | |
| ### v1 β v4 | |
| Data scaling didn't help. 200 targeted hard examples outperformed 2000 generic ones. | |
| ### Ablation: Action Space Paradox | |
| Adding `back` to the action space created a powerful tool β but the model over-generalized it, using `back` for everything. | |
| ### Reason-First | |
| With just 40 reasoning examples (7.8s training), the model went from 4/12 β 10/12. The reasoning head eliminates heuristic shortcutting. | |
| ## Usage | |
| Enter a task and browsing history. The model shows its reasoning, then picks the next action. |