Instructions to use Georgefifth/tiny-browser-planner-reason with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Georgefifth/tiny-browser-planner-reason with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("./model/MiniCPM5-1B") model = PeftModel.from_pretrained(base_model, "Georgefifth/tiny-browser-planner-reason") - Notebooks
- Google Colab
- Kaggle
| base_model: openbmb/MiniCPM5-1B | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| tags: | |
| - lora | |
| - browser-agent | |
| - reasoning | |
| - planning | |
| - build-small | |
| - peft | |
| # TinyBrowserPlanner-Reason | |
| ## A 1B model can explain the correct browser action before it can reliably choose it. | |
| This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks. | |
| The original goal was simple: | |
| Can a 1B model decide the next browser action given a task and an observation? | |
| Actions include: | |
| * search | |
| * open_page | |
| * extract | |
| * back | |
| * finish | |
| * refine_search | |
| ## Key Findings | |
| ### 1. Data quality beats data quantity | |
| Adding large amounts of similar trajectory data produced almost no improvement. | |
| However, adding only ~200 carefully designed hard examples significantly improved replanning behavior. | |
| ### 2. Adding actions creates both capability and confusion | |
| Introducing the `back` action allowed the model to recover from wrong pages and paywalls. | |
| However, the model quickly learned to overuse `back` as a universal solution. | |
| ### 3. Reason-First training dramatically improves planning | |
| Action-only planning: | |
| 4/12 | |
| Reason-First planning: | |
| 10/12 | |
| Using only 40 reasoning examples and less than 10 seconds of additional training. | |
| The most important result: | |
| The model already understood the state of the environment. | |
| It failed because it learned shortcut action heuristics. | |
| Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality. | |
| ## Example | |
| Task: | |
| Find Apple stock price | |
| Observation: | |
| Price displayed prominently on page. | |
| Reason: | |
| The requested information is already available. | |
| Action: | |
| extract | |
| --- | |
| Task: | |
| Find CEO of OpenAI | |
| Observation: | |
| Page discusses Microsoft CEO. | |
| Reason: | |
| The page is irrelevant to the requested information. | |
| Action: | |
| back | |
| ## Training | |
| Base model: | |
| openbmb/MiniCPM5-1B | |
| Method: | |
| LoRA fine-tuning | |
| Framework: | |
| Unsloth + PEFT | |
| ## Limitations | |
| The model performs well on simple browser planning and replanning scenarios. | |
| However, it still struggles with: | |
| * multi-step recovery chains | |
| * long-horizon planning | |
| * complex search strategy generation | |
| * comparison tasks requiring multiple sources | |
| ## Conclusion | |
| This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models. | |
| A 1B model can often explain the correct action before it can reliably choose it. | |
| This repository contains only the LoRA adapter. | |
| The base model must be downloaded separately. | |