Instructions to use egotools-dev/egotools-8b-v3_3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use egotools-dev/egotools-8b-v3_3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="egotools-dev/egotools-8b-v3_3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("egotools-dev/egotools-8b-v3_3")
model = AutoModelForImageTextToText.from_pretrained("egotools-dev/egotools-8b-v3_3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use egotools-dev/egotools-8b-v3_3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "egotools-dev/egotools-8b-v3_3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "egotools-dev/egotools-8b-v3_3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/egotools-dev/egotools-8b-v3_3

SGLang

How to use egotools-dev/egotools-8b-v3_3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "egotools-dev/egotools-8b-v3_3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "egotools-dev/egotools-8b-v3_3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "egotools-dev/egotools-8b-v3_3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "egotools-dev/egotools-8b-v3_3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use egotools-dev/egotools-8b-v3_3 with Docker Model Runner:
```
docker model run hf.co/egotools-dev/egotools-8b-v3_3
```

egotools-8b-v3_3 / README.md

shulin16

Add files using upload-large-folder tool

cd3f102 verified 18 days ago

preview code

raw

history blame contribute delete

3.36 kB

metadata

base_model: Qwen/Qwen3-VL-8B-Instruct
library_name: transformers
pipeline_tag: image-text-to-text
tags:
  - qwen3-vl
  - video-language-model
  - egocentric-video
  - ms-swift
  - sft

EgoTools 8B v3.3

This repository stores intermediate checkpoints from full-parameter SFT of Qwen/Qwen3-VL-8B-Instruct on EgoTools v3.3.

Available checkpoints:

Checkpoint	Location	Step	Epoch	Notes
checkpoint-300	repository root	300 / 907	0.3309	First uploaded intermediate checkpoint.
checkpoint-600	`checkpoint-600/`	600 / 907	0.6619	Second uploaded intermediate checkpoint.

The repository root currently contains the checkpoint-300 model files. checkpoint-600 is stored in the checkpoint-600/ subdirectory.

Training Setup

Field	Value
Base model	`Qwen/Qwen3-VL-8B-Instruct`
Framework	`ms-swift` / Transformers
Tuning type	Full-parameter SFT
Trainable params	8.19B / 8.77B, VLM LLM trainable; ViT and aligner frozen
GPUs	8 x NVIDIA A100-SXM4-40GB
Precision	BF16
DeepSpeed	ZeRO-3, no optimizer/parameter offload
Attention	FlashAttention
Per-device batch size	2
Gradient accumulation	8
Effective batch size	128 samples
Epochs	1
Max steps	907
Learning rate	`2.3e-6`
LR scheduler	`constant`
Warmup	0
Weight decay	0.1
Max sequence length	8192
Video frame sampling	up to 64 frames
Video token budget	128
Image token budget	1024
Save interval	every 300 steps

Important note: this run used a constant 2.3e-6 LR. Earlier V2 exploratory runs used 5e-6 with cosine decay and 3% warmup; these v3.3 checkpoints do not use that schedule.

Training Data

Dataset: EgoTools v3.3 SFT, converted to ms-swift video-clip format.

Main local training file:

data_v3_3/egotools_v3_3_sft_final_clips.swift.jsonl

Overall Mix

Family	Rows	Ratio
Multiple-choice QA	104,613	90.16%
Caption / narration completion	9,473	8.16%
Open-ended QA	1,945	1.68%
Total	116,031	100.00%

Sample Type Mix

Sample type	Rows	Ratio
`mcq`	63,276	54.53%
`narration_mcq`	17,591	15.16%
`egoschema_caption_mcq`	11,830	10.20%
`egoplan_next_action_mcq`	7,990	6.89%
`caption_completion`	7,532	6.49%
`egoschema_fused_mcq`	3,926	3.38%
`egothink_open_qa`	1,945	1.68%
`narration_completion`	1,941	1.67%

Option / Answer Balance

The MCQ portion was deterministically balanced by option count.

Option count	Answer distribution
4 options	A: 1,998; B: 1,997; C: 1,998; D: 1,997
5 options	A: 6,669; B: 6,669; C: 6,670; D: 6,669; E: 6,670
8 options	A: 7,910; B: 7,909; C: 7,910; D: 7,910; E: 7,909; F: 7,910; G: 7,909; H: 7,909

Video Coverage

Field	Value
Unique video references	362
Unique generated clips	13,100
Missing video rows	0
Full train-video references	92,572
Train-segment clip references	23,459

Checkpoint Metrics

Checkpoint	Loss	Token accuracy	LR
checkpoint-300	0.8521	0.7638	2.3e-6
checkpoint-600	0.8500	0.7705	2.3e-6

No evaluation set was run for these intermediate checkpoints.