File size: 2,536 Bytes
f518f05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
base_model: openbmb/MiniCPM5-1B
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - browser-agent
  - reasoning
  - planning
  - build-small
  - peft
---

# TinyBrowserPlanner-Reason

## A 1B model can explain the correct browser action before it can reliably choose it.

This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.

The original goal was simple:

Can a 1B model decide the next browser action given a task and an observation?

Actions include:

* search
* open_page
* extract
* back
* finish
* refine_search

## Key Findings

### 1. Data quality beats data quantity

Adding large amounts of similar trajectory data produced almost no improvement.

However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.

### 2. Adding actions creates both capability and confusion

Introducing the `back` action allowed the model to recover from wrong pages and paywalls.

However, the model quickly learned to overuse `back` as a universal solution.

### 3. Reason-First training dramatically improves planning

Action-only planning:

4/12

Reason-First planning:

10/12

Using only 40 reasoning examples and less than 10 seconds of additional training.

The most important result:

The model already understood the state of the environment.

It failed because it learned shortcut action heuristics.

Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.

## Example

Task:

Find Apple stock price

Observation:

Price displayed prominently on page.

Reason:

The requested information is already available.

Action:

extract

---

Task:

Find CEO of OpenAI

Observation:

Page discusses Microsoft CEO.

Reason:

The page is irrelevant to the requested information.

Action:

back

## Training

Base model:

openbmb/MiniCPM5-1B

Method:

LoRA fine-tuning

Framework:

Unsloth + PEFT

## Limitations

The model performs well on simple browser planning and replanning scenarios.

However, it still struggles with:

* multi-step recovery chains
* long-horizon planning
* complex search strategy generation
* comparison tasks requiring multiple sources

## Conclusion

This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.

A 1B model can often explain the correct action before it can reliably choose it.

This repository contains only the LoRA adapter.

The base model must be downloaded separately.