--- library_name: transformers pipeline_tag: image-text-to-text tags: - vision-language - qwen3.5-vl - phone-agent - tool-use --- # PhoneBuddy-4B PhoneBuddy-4B is the main PhoneBuddy Real+Mock reinforcement-learning checkpoint. Project page: https://phonebuddyai.github.io/ GitHub: https://github.com/PhoneBuddyAI/phonebuddy ## Model Details - Model family: Qwen3.5 VL style checkpoint - `model_type`: `qwen3_5` - Processor: `Qwen3VLProcessor` - Checkpoint role: main Real+Mock RL checkpoint - Tool-call format: Qwen-style XML as defined in `chat_template.jinja` The model card and repository are initially published as private for validation. ## Tool-Call Format PhoneBuddy-4B follows the Qwen-style XML tool-call format defined by the bundled `chat_template.jinja`, for example: ```xml value_1 ``` Use the tokenizer or processor chat template from this repository when constructing prompts with tools. ## Loading Environment These checkpoints use Qwen3.5 VL style model metadata: - `model_type`: `qwen3_5` - Architecture: `Qwen3_5ForConditionalGeneration` - Processor: `Qwen3VLProcessor` - Tokenizer metadata: `TokenizersBackend` Use the matching Qwen3.5 VL / PhoneBuddy training or inference environment that registers these classes. In a generic public Transformers environment, compatibility depends on whether that build includes `qwen3_5` and the tokenizer backend used by this checkpoint. A minimal processor load can be tested with: ```python from transformers import AutoProcessor repo_id = "PhoneBuddyAI/PhoneBuddy-4B" processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=False) ``` Full config, tokenizer, and model loading should be done in an environment that supports the classes above. For example, public `transformers==4.57.6` does not register `model_type=qwen3_5`, and `AutoTokenizer` does not import `TokenizersBackend`; in that environment those failures indicate version/class compatibility, not missing checkpoint files. ## Intended Use PhoneBuddy is designed for research on phone agents, multimodal tool use, and visual action reasoning. See the project page and GitHub repository for code and usage details.