Instructions to use ViTeX-Bench/ViTeX-Edit-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ViTeX-Bench/ViTeX-Edit-14B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ViTeX-Bench/ViTeX-Edit-14B", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Anonymous Authors commited on
Commit Β·
3ceb528
1
Parent(s): 78f3fef
Rename '(Corp)' variant to '(Composite)' to match the paper
Browse filesThe locality-preserving post-processing variant is referred to as
"ViTeX-14B (Composite)" in the paper and on the leaderboard. Update the
model card and the make_corp_baseline.py docstring to match. The
script filename and the default output directory (ViTeX-14B_Corp/) are
left as legacy identifiers so existing eval pipelines keep working.
- README.md +2 -2
- make_corp_baseline.py +1 -1
README.md
CHANGED
|
@@ -35,7 +35,7 @@ This repository is fully self-contained β it bundles the trained weights, the
|
|
| 35 |
βββ README.md
|
| 36 |
βββ requirements.txt
|
| 37 |
βββ inference_example.py run ViTeX-14B on one (video, mask, glyph) tuple
|
| 38 |
-
βββ make_corp_baseline.py build the ViTeX-14B (
|
| 39 |
βββ vitex_14b.safetensors (8 GB β trained adapter weights)
|
| 40 |
βββ diffsynth/ (bundled inference library)
|
| 41 |
βββ base_model/ (70 GB β frozen base model files)
|
|
@@ -80,7 +80,7 @@ python inference_example.py \
|
|
| 80 |
|
| 81 |
The script automatically uses the bundled `base_model/` and `vitex_14b.safetensors` β no extra downloads.
|
| 82 |
|
| 83 |
-
## Locality-preserving variant: ViTeX-14B (
|
| 84 |
|
| 85 |
`make_corp_baseline.py` is a deterministic, training-free post-processing wrapper that composes ViTeX-14B's predicted text region back onto the source video. Two per-frame operations:
|
| 86 |
|
|
|
|
| 35 |
βββ README.md
|
| 36 |
βββ requirements.txt
|
| 37 |
βββ inference_example.py run ViTeX-14B on one (video, mask, glyph) tuple
|
| 38 |
+
βββ make_corp_baseline.py build the ViTeX-14B (Composite) variant from raw predictions
|
| 39 |
βββ vitex_14b.safetensors (8 GB β trained adapter weights)
|
| 40 |
βββ diffsynth/ (bundled inference library)
|
| 41 |
βββ base_model/ (70 GB β frozen base model files)
|
|
|
|
| 80 |
|
| 81 |
The script automatically uses the bundled `base_model/` and `vitex_14b.safetensors` β no extra downloads.
|
| 82 |
|
| 83 |
+
## Locality-preserving variant: ViTeX-14B (Composite)
|
| 84 |
|
| 85 |
`make_corp_baseline.py` is a deterministic, training-free post-processing wrapper that composes ViTeX-14B's predicted text region back onto the source video. Two per-frame operations:
|
| 86 |
|
make_corp_baseline.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Build the ViTeX-14B (
|
| 2 |
|
| 3 |
For each test clip:
|
| 4 |
1. Read source video, ViTeX-14B prediction, and the dilated text mask.
|
|
|
|
| 1 |
+
"""Build the ViTeX-14B (Composite) baseline.
|
| 2 |
|
| 3 |
For each test clip:
|
| 4 |
1. Read source video, ViTeX-14B prediction, and the dilated text mask.
|