---
base_model: openbmb/MiniCPM5-1B
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - browser-agent
  - reasoning
  - planning
  - build-small
  - peft
---

# TinyBrowserPlanner-Reason

## A 1B model can explain the correct browser action before it can reliably choose it.

This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.

The original goal was simple:

Can a 1B model decide the next browser action given a task and an observation?

Actions include:

* search
* open_page
* extract
* back
* finish
* refine_search

## Key Findings

### 1. Data quality beats data quantity

Adding large amounts of similar trajectory data produced almost no improvement.

However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.

### 2. Adding actions creates both capability and confusion

Introducing the `back` action allowed the model to recover from wrong pages and paywalls.

However, the model quickly learned to overuse `back` as a universal solution.

### 3. Reason-First training dramatically improves planning

Action-only planning:

4/12

Reason-First planning:

10/12

Using only 40 reasoning examples and less than 10 seconds of additional training.

The most important result:

The model already understood the state of the environment.

It failed because it learned shortcut action heuristics.

Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.

## Example

Task:

Find Apple stock price

Observation:

Price displayed prominently on page.

Reason:

The requested information is already available.

Action:

extract

---

Task:

Find CEO of OpenAI

Observation:

Page discusses Microsoft CEO.

Reason:

The page is irrelevant to the requested information.

Action:

back

## Training

Base model:

openbmb/MiniCPM5-1B

Method:

LoRA fine-tuning

Framework:

Unsloth + PEFT

## Limitations

The model performs well on simple browser planning and replanning scenarios.

However, it still struggles with:

* multi-step recovery chains
* long-horizon planning
* complex search strategy generation
* comparison tasks requiring multiple sources

## Conclusion

This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.

A 1B model can often explain the correct action before it can reliably choose it.

This repository contains only the LoRA adapter.

The base model must be downloaded separately.