--- license: apache-2.0 base_model: allenai/MolmoWeb-8B teacher_model: stepfun-ai/Step-3.5-Flash tags: - web-agent - authenticator - registration - distilled - turboquant - sar-trajectory - pixel-native - step-3.5-logic datasets: - allenai/MolmoWeb-HumanTrajs --- # Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed This model is a "God-tier" reasoning visual agent specifically designed to speed run complex web account management tasks. By distilling the 196B **Step-3.5-Flash** "Brain" into the 8B **MolmoWeb** "Eyes" and applying **TurboQuant** extreme compression, we have created a specialist that fits in ~3GB of VRAM while maintaining frontier-level reasoning. ## Model Details ### Model Description - **Developed by:** [@macmacmacmac ] - **Model type:** Vision-Language Model (VLM) / Agentic Specialist - **Language(s) (NLP):** English (Optimized for account registration, logins, mfa) - **License:** Apache-2.0 - **Finetuned from model:** allenai/MolmoWeb-8B - **Teacher Model:** stepfun-ai/Step-3.5-Flash (196B Sparse MoE) ### Model Sources - **Repository:** [https://huggingface.co/macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed] - **Trajectory Data:** allenai/MolmoWeb-HumanTrajs (Identity Subset) - **Quantization Tech:** TurboQuant (March 2026 Release) ## Uses ### Direct Use Specifically intended for **Identity Steering**: - Creating new accounts (Sign-up flows). - Managing existing credentials (Login/MFA handling). - Password recovery and security settings adjustment. - Navigating high-anxiety UI (Cookie walls, pop-ups, system prompts). ### Out-of-Scope Use - General purpose chatbot tasks (Poetry, coding, general trivia). - High-stakes financial transfers without human-in-the-loop. - Medical diagnosis or legal advice. - Non-web based automation (OS-level file management). ## Bias, Risks, and Limitations - **Coordinate Drift:** Extreme TurboQuant compression (3.5-bit) can occasionally cause 2-5px drift in click accuracy on ultra-high-density displays. - **Hallucination:** While reasoning is aligned with Step-3.5, the model may occasionally misinterpret legacy HTML that deviates significantly from standard human trajectories. - **Privacy:** While the model runs locally, the *content* of the screen is processed. Users must ensure the environment is secure. ### Recommendations Users should deploy this model with the **Web UI Overlay** (link soon) to ensure the agent's internal reasoning (<|thought|>) is transparent to the user, reducing anxiety during automated actions. ## How to Get Started with the Model ```python import turboquant as tq from transformers import AutoModelForImageTextToText # Optimized for 3GB VRAM deployment model = AutoModelForImageTextToText.from_pretrained( "YourOrg/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed", device_map="auto", trust_remote_code=True )