macmacmacmac
/

Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed

Model card Files Files and versions

Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed / README.md

macmacmacmac's picture

Update README.md

8f49908 verified 14 days ago

|

history blame contribute delete

2.85 kB

	---
	license: apache-2.0
	base_model: allenai/MolmoWeb-8B
	teacher_model: stepfun-ai/Step-3.5-Flash
	tags:
	- web-agent
	- authenticator
	- registration
	- distilled
	- turboquant
	- sar-trajectory
	- pixel-native
	- step-3.5-logic
	datasets:
	- allenai/MolmoWeb-HumanTrajs
	---

	# Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed

	This model is a "God-tier" reasoning visual agent specifically designed to speed run complex web account management tasks. By distilling the 196B Step-3.5-Flash "Brain" into the 8B MolmoWeb "Eyes" and applying TurboQuant extreme compression, we have created a specialist that fits in ~3GB of VRAM while maintaining frontier-level reasoning.

	## Model Details

	### Model Description

	- Developed by: [@macmacmacmac ]
	- Model type: Vision-Language Model (VLM) / Agentic Specialist
	- Language(s) (NLP): English (Optimized for account registration, logins, mfa)
	- License: Apache-2.0
	- Finetuned from model: allenai/MolmoWeb-8B
	- Teacher Model: stepfun-ai/Step-3.5-Flash (196B Sparse MoE)

	### Model Sources

	- Repository: [https://huggingface.co/macmacmacmac/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed]
	- Trajectory Data: allenai/MolmoWeb-HumanTrajs (Identity Subset)
	- Quantization Tech: TurboQuant (March 2026 Release)

	## Uses

	### Direct Use

	Specifically intended for Identity Steering:
	- Creating new accounts (Sign-up flows).
	- Managing existing credentials (Login/MFA handling).
	- Password recovery and security settings adjustment.
	- Navigating high-anxiety UI (Cookie walls, pop-ups, system prompts).

	### Out-of-Scope Use

	- General purpose chatbot tasks (Poetry, coding, general trivia).
	- High-stakes financial transfers without human-in-the-loop.
	- Medical diagnosis or legal advice.
	- Non-web based automation (OS-level file management).

	## Bias, Risks, and Limitations

	- Coordinate Drift: Extreme TurboQuant compression (3.5-bit) can occasionally cause 2-5px drift in click accuracy on ultra-high-density displays.
	- Hallucination: While reasoning is aligned with Step-3.5, the model may occasionally misinterpret legacy HTML that deviates significantly from standard human trajectories.
	- Privacy: While the model runs locally, the content of the screen is processed. Users must ensure the environment is secure.

	### Recommendations

	Users should deploy this model with the Web UI Overlay (link soon) to ensure the agent's internal reasoning (<\|thought\|>) is transparent to the user, reducing anxiety during automated actions.

	## How to Get Started with the Model

	```python
	import turboquant as tq
	from transformers import AutoModelForImageTextToText

	# Optimized for 3GB VRAM deployment
	model = AutoModelForImageTextToText.from_pretrained(
	"YourOrg/Step-3.5-Flash-Distill-MolmoWeb-8B-TQ-Mixed",
	device_map="auto",
	trust_remote_code=True
	)