--- base_model: openbmb/MiniCPM5-1B library_name: peft pipeline_tag: text-generation tags: - lora - browser-agent - reasoning - planning - build-small - peft --- # TinyBrowserPlanner-Reason ## A 1B model can explain the correct browser action before it can reliably choose it. This repository contains a LoRA adapter trained on top of MiniCPM5-1B for lightweight browser planning tasks. The original goal was simple: Can a 1B model decide the next browser action given a task and an observation? Actions include: * search * open_page * extract * back * finish * refine_search ## Key Findings ### 1. Data quality beats data quantity Adding large amounts of similar trajectory data produced almost no improvement. However, adding only ~200 carefully designed hard examples significantly improved replanning behavior. ### 2. Adding actions creates both capability and confusion Introducing the `back` action allowed the model to recover from wrong pages and paywalls. However, the model quickly learned to overuse `back` as a universal solution. ### 3. Reason-First training dramatically improves planning Action-only planning: 4/12 Reason-First planning: 10/12 Using only 40 reasoning examples and less than 10 seconds of additional training. The most important result: The model already understood the state of the environment. It failed because it learned shortcut action heuristics. Forcing the model to explicitly generate a reason before selecting an action dramatically improved decision quality. ## Example Task: Find Apple stock price Observation: Price displayed prominently on page. Reason: The requested information is already available. Action: extract --- Task: Find CEO of OpenAI Observation: Page discusses Microsoft CEO. Reason: The page is irrelevant to the requested information. Action: back ## Training Base model: openbmb/MiniCPM5-1B Method: LoRA fine-tuning Framework: Unsloth + PEFT ## Limitations The model performs well on simple browser planning and replanning scenarios. However, it still struggles with: * multi-step recovery chains * long-horizon planning * complex search strategy generation * comparison tasks requiring multiple sources ## Conclusion This project suggests that explicit reasoning may act as a lightweight regularizer for small planning models. A 1B model can often explain the correct action before it can reliably choose it. This repository contains only the LoRA adapter. The base model must be downloaded separately.