EgoTools 8B v3.3

This repository stores intermediate checkpoints from full-parameter SFT of Qwen/Qwen3-VL-8B-Instruct on EgoTools v3.3.

Available checkpoints:

Checkpoint Location Step Epoch Notes
checkpoint-300 repository root 300 / 907 0.3309 First uploaded intermediate checkpoint.
checkpoint-600 checkpoint-600/ 600 / 907 0.6619 Second uploaded intermediate checkpoint.

The repository root currently contains the checkpoint-300 model files. checkpoint-600 is stored in the checkpoint-600/ subdirectory.

Training Setup

Field Value
Base model Qwen/Qwen3-VL-8B-Instruct
Framework ms-swift / Transformers
Tuning type Full-parameter SFT
Trainable params 8.19B / 8.77B, VLM LLM trainable; ViT and aligner frozen
GPUs 8 x NVIDIA A100-SXM4-40GB
Precision BF16
DeepSpeed ZeRO-3, no optimizer/parameter offload
Attention FlashAttention
Per-device batch size 2
Gradient accumulation 8
Effective batch size 128 samples
Epochs 1
Max steps 907
Learning rate 2.3e-6
LR scheduler constant
Warmup 0
Weight decay 0.1
Max sequence length 8192
Video frame sampling up to 64 frames
Video token budget 128
Image token budget 1024
Save interval every 300 steps

Important note: this run used a constant 2.3e-6 LR. Earlier V2 exploratory runs used 5e-6 with cosine decay and 3% warmup; these v3.3 checkpoints do not use that schedule.

Training Data

Dataset: EgoTools v3.3 SFT, converted to ms-swift video-clip format.

Main local training file:

data_v3_3/egotools_v3_3_sft_final_clips.swift.jsonl

Overall Mix

Family Rows Ratio
Multiple-choice QA 104,613 90.16%
Caption / narration completion 9,473 8.16%
Open-ended QA 1,945 1.68%
Total 116,031 100.00%

Sample Type Mix

Sample type Rows Ratio
mcq 63,276 54.53%
narration_mcq 17,591 15.16%
egoschema_caption_mcq 11,830 10.20%
egoplan_next_action_mcq 7,990 6.89%
caption_completion 7,532 6.49%
egoschema_fused_mcq 3,926 3.38%
egothink_open_qa 1,945 1.68%
narration_completion 1,941 1.67%

Option / Answer Balance

The MCQ portion was deterministically balanced by option count.

Option count Answer distribution
4 options A: 1,998; B: 1,997; C: 1,998; D: 1,997
5 options A: 6,669; B: 6,669; C: 6,670; D: 6,669; E: 6,670
8 options A: 7,910; B: 7,909; C: 7,910; D: 7,910; E: 7,909; F: 7,910; G: 7,909; H: 7,909

Video Coverage

Field Value
Unique video references 362
Unique generated clips 13,100
Missing video rows 0
Full train-video references 92,572
Train-segment clip references 23,459

Checkpoint Metrics

Checkpoint Loss Token accuracy LR
checkpoint-300 0.8521 0.7638 2.3e-6
checkpoint-600 0.8500 0.7705 2.3e-6

No evaluation set was run for these intermediate checkpoints.

Downloads last month
27
Safetensors
Model size
770k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for egotools/egotools-8b-v3_3

Finetuned
(262)
this model
Quantizations
1 model