macmacmacmac's picture
Update README.md
8f49908 verified
---
license: apache-2.0
base_model: allenai/MolmoWeb-8B
teacher_model: stepfun-ai/Step-3.5-Flash
tags:
- web-agent
- authenticator
- registration
- distilled
- turboquant
- sar-trajectory
- pixel-native
- step-3.5-logic
datasets:
- allenai/MolmoWeb-HumanTrajs
---
# Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed
This model is a "God-tier" reasoning visual agent specifically designed to speed run complex web account management tasks. By distilling the 196B **Step-3.5-Flash** "Brain" into the 8B **MolmoWeb** "Eyes" and applying **TurboQuant** extreme compression, we have created a specialist that fits in ~3GB of VRAM while maintaining frontier-level reasoning.
## Model Details
### Model Description
- **Developed by:** [@macmacmacmac ]
- **Model type:** Vision-Language Model (VLM) / Agentic Specialist
- **Language(s) (NLP):** English (Optimized for account registration, logins, mfa)
- **License:** Apache-2.0
- **Finetuned from model:** allenai/MolmoWeb-8B
- **Teacher Model:** stepfun-ai/Step-3.5-Flash (196B Sparse MoE)
### Model Sources
- **Repository:** [https://huggingface.co/macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed]
- **Trajectory Data:** allenai/MolmoWeb-HumanTrajs (Identity Subset)
- **Quantization Tech:** TurboQuant (March 2026 Release)
## Uses
### Direct Use
Specifically intended for **Identity Steering**:
- Creating new accounts (Sign-up flows).
- Managing existing credentials (Login/MFA handling).
- Password recovery and security settings adjustment.
- Navigating high-anxiety UI (Cookie walls, pop-ups, system prompts).
### Out-of-Scope Use
- General purpose chatbot tasks (Poetry, coding, general trivia).
- High-stakes financial transfers without human-in-the-loop.
- Medical diagnosis or legal advice.
- Non-web based automation (OS-level file management).
## Bias, Risks, and Limitations
- **Coordinate Drift:** Extreme TurboQuant compression (3.5-bit) can occasionally cause 2-5px drift in click accuracy on ultra-high-density displays.
- **Hallucination:** While reasoning is aligned with Step-3.5, the model may occasionally misinterpret legacy HTML that deviates significantly from standard human trajectories.
- **Privacy:** While the model runs locally, the *content* of the screen is processed. Users must ensure the environment is secure.
### Recommendations
Users should deploy this model with the **Web UI Overlay** (link soon) to ensure the agent's internal reasoning (<|thought|>) is transparent to the user, reducing anxiety during automated actions.
## How to Get Started with the Model
```python
import turboquant as tq
from transformers import AutoModelForImageTextToText
# Optimized for 3GB VRAM deployment
model = AutoModelForImageTextToText.from_pretrained(
"YourOrg/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed",
device_map="auto",
trust_remote_code=True
)