Georgefifth's picture
Upload folder using huggingface_hub
f518f05 verified
|
Raw
History Blame Contribute Delete
2.54 kB
---
base_model: openbmb/MiniCPM5-1B
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- browser-agent
- reasoning
- planning
- build-small
- peft
---
# TinyBrowserPlanner-Reason
## A 1B model can explain the correct browser action before it can reliably choose it.
This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.
The original goal was simple:
Can a 1B model decide the next browser action given a task and an observation?
Actions include:
* search
* open_page
* extract
* back
* finish
* refine_search
## Key Findings
### 1. Data quality beats data quantity
Adding large amounts of similar trajectory data produced almost no improvement.
However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.
### 2. Adding actions creates both capability and confusion
Introducing the `back` action allowed the model to recover from wrong pages and paywalls.
However, the model quickly learned to overuse `back` as a universal solution.
### 3. Reason-First training dramatically improves planning
Action-only planning:
4/12
Reason-First planning:
10/12
Using only 40 reasoning examples and less than 10 seconds of additional training.
The most important result:
The model already understood the state of the environment.
It failed because it learned shortcut action heuristics.
Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.
## Example
Task:
Find Apple stock price
Observation:
Price displayed prominently on page.
Reason:
The requested information is already available.
Action:
extract
---
Task:
Find CEO of OpenAI
Observation:
Page discusses Microsoft CEO.
Reason:
The page is irrelevant to the requested information.
Action:
back
## Training
Base model:
openbmb/MiniCPM5-1B
Method:
LoRA fine-tuning
Framework:
Unsloth + PEFT
## Limitations
The model performs well on simple browser planning and replanning scenarios.
However, it still struggles with:
* multi-step recovery chains
* long-horizon planning
* complex search strategy generation
* comparison tasks requiring multiple sources
## Conclusion
This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.
A 1B model can often explain the correct action before it can reliably choose it.
This repository contains only the LoRA adapter.
The base model must be downloaded separately.