Georgefifth's picture
Upload README.md with huggingface_hub
5eb94e4 verified
|
Raw
History Blame Contribute Delete
1.47 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Tiny Browser Planner
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.36.1
python_version: 3.12.0
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:welltuned
  - achievement:fieldnotes

Tiny Browser Planner

A 1B model that plans browser actions β€” fine-tuned MiniCPM5-1B with LoRA.

The core finding: A 1B model can explain the correct browser action before it can reliably choose it.

Links

The Core Finding

A 1B model can explain the correct browser action before it can reliably choose it.

Experiments

v1 β†’ v4

Data scaling didn't help. 200 targeted hard examples outperformed 2000 generic ones.

Ablation: Action Space Paradox

Adding back to the action space created a powerful tool β€” but the model over-generalized it, using back for everything.

Reason-First

With just 40 reasoning examples (7.8s training), the model went from 4/12 β†’ 10/12. The reasoning head eliminates heuristic shortcutting.

Usage

Enter a task and browsing history. The model shows its reasoning, then picks the next action.