Georgefifth
/

tiny-browser-planner-reason

Text Generation

Model card Files Files and versions

tiny-browser-planner-reason / README.md

Georgefifth's picture

Upload folder using huggingface_hub

f518f05 verified 17 days ago

|

History Blame Contribute Delete

2.54 kB

	---
	base_model: openbmb/MiniCPM5-1B
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- browser-agent
	- reasoning
	- planning
	- build-small
	- peft
	---

	# TinyBrowserPlanner-Reason

	## A 1B model can explain the correct browser action before it can reliably choose it.

	This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks.

	The original goal was simple:

	Can a 1B model decide the next browser action given a task and an observation?

	Actions include:

	* search
	* open_page
	* extract
	* back
	* finish
	* refine_search

	## Key Findings

	### 1. Data quality beats data quantity

	Adding large amounts of similar trajectory data produced almost no improvement.

	However, adding only ~200 carefully designed hard examples significantly improved replanning behavior.

	### 2. Adding actions creates both capability and confusion

	Introducing the `back` action allowed the model to recover from wrong pages and paywalls.

	However, the model quickly learned to overuse `back` as a universal solution.

	### 3. Reason-First training dramatically improves planning

	Action-only planning:

	4/12

	Reason-First planning:

	10/12

	Using only 40 reasoning examples and less than 10 seconds of additional training.

	The most important result:

	The model already understood the state of the environment.

	It failed because it learned shortcut action heuristics.

	Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality.

	## Example

	Task:

	Find Apple stock price

	Observation:

	Price displayed prominently on page.

	Reason:

	The requested information is already available.

	Action:

	extract

	---

	Task:

	Find CEO of OpenAI

	Observation:

	Page discusses Microsoft CEO.

	Reason:

	The page is irrelevant to the requested information.

	Action:

	back

	## Training

	Base model:

	openbmb/MiniCPM5-1B

	Method:

	LoRA fine-tuning

	Framework:

	Unsloth + PEFT

	## Limitations

	The model performs well on simple browser planning and replanning scenarios.

	However, it still struggles with:

	* multi-step recovery chains
	* long-horizon planning
	* complex search strategy generation
	* comparison tasks requiring multiple sources

	## Conclusion

	This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models.

	A 1B model can often explain the correct action before it can reliably choose it.

	This repository contains only the LoRA adapter.

	The base model must be downloaded separately.