Image-Text-to-Text
PEFT
Safetensors
MLX
GGUF
English
ui-grounding
screen-grounding
browser-agent
claude-computer-use
codex
browser-use
skyvern
hybrid-ai
compound-ai
specialist-model
lora
ollama
apple-silicon
qwen3-vl
gpt-4v-alternative
cost-effective-ai
conversational
Instructions to use renezander030/browserground with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use renezander030/browserground with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-VL-2B-Instruct") model = PeftModel.from_pretrained(base_model, "renezander030/browserground") - MLX
How to use renezander030/browserground with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("renezander030/browserground") config = load_config("renezander030/browserground") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use renezander030/browserground with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "renezander030/browserground"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "renezander030/browserground" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use renezander030/browserground with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "renezander030/browserground"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default renezander030/browserground
Run Hermes
hermes
funnel: add renezander.com + Upwork callouts (top + Work-with-me section)
Browse files
README.md
CHANGED
|
@@ -34,6 +34,15 @@ datasets:
|
|
| 34 |
|
| 35 |
> **The local UI-grounding specialist for hybrid AI agents.** Drop in a screenshot + text target, get a strict JSON bbox. 2B params. MLX-native. Apache 2.0.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
## Why this exists β the hybrid AI argument
|
| 38 |
|
| 39 |
Today, most AI agents route **every** screenshot to a cloud frontier model (GPT-4V, Claude Vision, Gemini) just to find click coordinates. That's a $0.01β0.05 multimodal call adding 800msβ2s of latency, repeated 20β50Γ per agent run. Cost and latency compound. Screenshots full of private UI leave your machine.
|
|
@@ -163,6 +172,22 @@ Full training scripts (private repo, request access): [renezander030/imgparse-ti
|
|
| 163 |
- **Custom agent stacks** that need a $0/call grounding step instead of GPT-4V per screenshot
|
| 164 |
- **Self-hosted compound-AI systems** with a routing layer (specialist model for grounding, general LLM for planning)
|
| 165 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
## Citation
|
| 167 |
|
| 168 |
```bibtex
|
|
|
|
| 34 |
|
| 35 |
> **The local UI-grounding specialist for hybrid AI agents.** Drop in a screenshot + text target, get a strict JSON bbox. 2B params. MLX-native. Apache 2.0.
|
| 36 |
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
> **Want a specialist local model for *your* agent stack?**
|
| 40 |
+
> Built by **Rene Zander**, freelance AI engineer (DE/EN, remote). Custom fine-tunes, hybrid-AI architectures, on-prem deployments.
|
| 41 |
+
> β Hire directly on **[Upwork](https://www.upwork.com/freelancers/reneza)**
|
| 42 |
+
> β Or reach out via **[renezander.com](https://renezander.com)**
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
## Why this exists β the hybrid AI argument
|
| 47 |
|
| 48 |
Today, most AI agents route **every** screenshot to a cloud frontier model (GPT-4V, Claude Vision, Gemini) just to find click coordinates. That's a $0.01β0.05 multimodal call adding 800msβ2s of latency, repeated 20β50Γ per agent run. Cost and latency compound. Screenshots full of private UI leave your machine.
|
|
|
|
| 172 |
- **Custom agent stacks** that need a $0/call grounding step instead of GPT-4V per screenshot
|
| 173 |
- **Self-hosted compound-AI systems** with a routing layer (specialist model for grounding, general LLM for planning)
|
| 174 |
|
| 175 |
+
## Work with me
|
| 176 |
+
|
| 177 |
+
This adapter is a public reference of the recipe I deliver to freelance clients: small, fast, structured-output local specialists that slot into compound-AI agent stacks and cut cloud-LLM bills without losing capability.
|
| 178 |
+
|
| 179 |
+
If you need one of these, I can build it:
|
| 180 |
+
|
| 181 |
+
- a **UI-grounding model trained on your own product's screenshots** β your dashboard, your app, your customer interfaces β for higher recall on the elements your agents actually click
|
| 182 |
+
- a **hybrid agent architecture** that routes narrow tasks (grounding, OCR, classification, embedding, extraction) to local specialist models and reserves cloud frontier LLMs for the reasoning that actually needs them
|
| 183 |
+
- an **on-prem agent deployment** β Apple Silicon (MLX), CUDA box, or your existing K8s β with no screenshots leaving your infrastructure
|
| 184 |
+
- a **structured-output evaluation harness** that tells you when the local model is actually good enough to replace the cloud call in production
|
| 185 |
+
|
| 186 |
+
**Two ways to engage:**
|
| 187 |
+
|
| 188 |
+
- **Upwork** β contract-ready, vetted, pay-as-you-go: <https://www.upwork.com/freelancers/reneza>
|
| 189 |
+
- **Direct** β for longer engagements, retainers, or a quick conversation: <https://renezander.com>
|
| 190 |
+
|
| 191 |
## Citation
|
| 192 |
|
| 193 |
```bibtex
|