| --- |
| license: apache-2.0 |
| base_model: allenai/MolmoWeb-8B |
| teacher_model: stepfun-ai/Step-3.5-Flash |
| tags: |
| - web-agent |
| - authenticator |
| - registration |
| - distilled |
| - turboquant |
| - sar-trajectory |
| - pixel-native |
| - step-3.5-logic |
| datasets: |
| - allenai/MolmoWeb-HumanTrajs |
| --- |
| |
| # Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed |
|
|
| This model is a "God-tier" reasoning visual agent specifically designed to speed run complex web account management tasks. By distilling the 196B **Step-3.5-Flash** "Brain" into the 8B **MolmoWeb** "Eyes" and applying **TurboQuant** extreme compression, we have created a specialist that fits in ~3GB of VRAM while maintaining frontier-level reasoning. |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| - **Developed by:** [@macmacmacmac ] |
| - **Model type:** Vision-Language Model (VLM) / Agentic Specialist |
| - **Language(s) (NLP):** English (Optimized for account registration, logins, mfa) |
| - **License:** Apache-2.0 |
| - **Finetuned from model:** allenai/MolmoWeb-8B |
| - **Teacher Model:** stepfun-ai/Step-3.5-Flash (196B Sparse MoE) |
|
|
| ### Model Sources |
|
|
| - **Repository:** [https://huggingface.co/macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed] |
| - **Trajectory Data:** allenai/MolmoWeb-HumanTrajs (Identity Subset) |
| - **Quantization Tech:** TurboQuant (March 2026 Release) |
|
|
| ## Uses |
|
|
| ### Direct Use |
|
|
| Specifically intended for **Identity Steering**: |
| - Creating new accounts (Sign-up flows). |
| - Managing existing credentials (Login/MFA handling). |
| - Password recovery and security settings adjustment. |
| - Navigating high-anxiety UI (Cookie walls, pop-ups, system prompts). |
|
|
| ### Out-of-Scope Use |
|
|
| - General purpose chatbot tasks (Poetry, coding, general trivia). |
| - High-stakes financial transfers without human-in-the-loop. |
| - Medical diagnosis or legal advice. |
| - Non-web based automation (OS-level file management). |
|
|
| ## Bias, Risks, and Limitations |
|
|
| - **Coordinate Drift:** Extreme TurboQuant compression (3.5-bit) can occasionally cause 2-5px drift in click accuracy on ultra-high-density displays. |
| - **Hallucination:** While reasoning is aligned with Step-3.5, the model may occasionally misinterpret legacy HTML that deviates significantly from standard human trajectories. |
| - **Privacy:** While the model runs locally, the *content* of the screen is processed. Users must ensure the environment is secure. |
|
|
| ### Recommendations |
|
|
| Users should deploy this model with the **Web UI Overlay** (link soon) to ensure the agent's internal reasoning (<|thought|>) is transparent to the user, reducing anxiety during automated actions. |
|
|
| ## How to Get Started with the Model |
|
|
| ```python |
| import turboquant as tq |
| from transformers import AutoModelForImageTextToText |
| |
| # Optimized for 3GB VRAM deployment |
| model = AutoModelForImageTextToText.from_pretrained( |
| "YourOrg/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed", |
| device_map="auto", |
| trust_remote_code=True |
| ) |