Image-to-Image
MLX
Safetensors
English
Chinese
qwen2_5_vl
apple-silicon
lance
bytedance
multimodal
text-to-image
image-editing
vqa
qwen2.5-vl
quantized
8-bit precision
Instructions to use mlx-community/Lance-3B-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Lance-3B-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Lance-3B-8bit mlx-community/Lance-3B-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Upload: Lance_3B 8-bit affine quant (retry)
Browse files- README.md +140 -0
- config.json +71 -0
- generation_config.json +12 -0
- llm_config.json +61 -0
- model.safetensors +3 -0
- quantization_report.json +16 -0
- tokenizer.json +0 -0
- vae.safetensors +3 -0
- vit.safetensors +3 -0
- vocab.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
library_name: mlx
|
| 7 |
+
pipeline_tag: image-to-image
|
| 8 |
+
tags:
|
| 9 |
+
- mlx
|
| 10 |
+
- apple-silicon
|
| 11 |
+
- lance
|
| 12 |
+
- bytedance
|
| 13 |
+
- multimodal
|
| 14 |
+
- text-to-image
|
| 15 |
+
- image-editing
|
| 16 |
+
- vqa
|
| 17 |
+
- qwen2.5-vl
|
| 18 |
+
- quantized
|
| 19 |
+
- 8-bit
|
| 20 |
+
base_model: bytedance-research/Lance
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
> 📂 Part of the **[Lance MLX collection](https://huggingface.co/collections/mlx-community/lance-mlx-6a0f3cd5648a74f8283fc8a4)** on mlx-community.
|
| 24 |
+
|
| 25 |
+
# Lance-3B-8bit (MLX, image specialist, 8-bit quantized)
|
| 26 |
+
|
| 27 |
+
8-bit groupwise affine quantization of [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16), the image-specialist Lance checkpoint. Produced via mlx-lm's `quantize_model` utility with a per-tower skip predicate (`time_embedder`, `llm2vae`, and `vae_in_proj` kept at bf16 for numerical safety; the bulk LLM weights — attention projections, MLP, embeddings, lm_head — quantized).
|
| 28 |
+
|
| 29 |
+
## Status
|
| 30 |
+
|
| 31 |
+
🟢 **Production-ready for image tasks on Apple Silicon as of 2026-05-21.**
|
| 32 |
+
|
| 33 |
+
| Capability | Status | Speedup vs bf16 |
|
| 34 |
+
|---|---|---|
|
| 35 |
+
| t2i (text → image) | ✅ Photorealistic, prompt-aligned | **~2.7× faster** (75 s vs 201 s for 768² × 30 steps × CFG=4.0) |
|
| 36 |
+
| image_edit (instruction-based) | ✅ Identity + style preservation | ~2.5× faster expected |
|
| 37 |
+
| x2t_image (image VQA) | ✅ Content-correct | similar / faster |
|
| 38 |
+
|
| 39 |
+
**Memory footprint:** 6.59 GB on disk (53% of the bf16 12.37 GB). Runtime RAM ~8–10 GB, comfortable on a 16 GB Mac.
|
| 40 |
+
|
| 41 |
+
## Quality notes vs bf16
|
| 42 |
+
|
| 43 |
+
- **Photorealism + content fidelity preserved.** Cats, dragons, portraits, etc., all generate cleanly.
|
| 44 |
+
- **Fine text on generated objects shows slight degradation.** E.g. "STOP" on a sign may render as "SNICS" or similar near-miss. The content is otherwise correct (correct color, correct rectangular sign shape, recognizable text-like glyphs).
|
| 45 |
+
- For prompts that don't require legible in-image text, output is visually indistinguishable from bf16 to a casual eye.
|
| 46 |
+
|
| 47 |
+
## Quickstart
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
from huggingface_hub import snapshot_download
|
| 51 |
+
weights = snapshot_download("mlx-community/Lance-3B-8bit")
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
### Text-to-image
|
| 55 |
+
|
| 56 |
+
```python
|
| 57 |
+
from lance_mlx.pipeline.t2i import TextToImagePipeline
|
| 58 |
+
|
| 59 |
+
pipe = TextToImagePipeline.from_pretrained(
|
| 60 |
+
lance_weights_dir=weights,
|
| 61 |
+
vae_safetensors=f"{weights}/vae.safetensors",
|
| 62 |
+
)
|
| 63 |
+
image = pipe.generate(
|
| 64 |
+
"A photorealistic tabby cat in a sunlit window.",
|
| 65 |
+
height=768, width=768, num_steps=30, cfg_scale=4.0, seed=42,
|
| 66 |
+
)
|
| 67 |
+
image.save("cat.png")
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
### Image editing + VQA
|
| 71 |
+
|
| 72 |
+
Same API as the bf16 variant — `ImageEditPipeline` and `UnderstandingPipeline` both pick up the `quantization` block in `config.json` automatically via `lance_mlx.model._loader.load_lance_model`.
|
| 73 |
+
|
| 74 |
+
## What's quantized vs skipped
|
| 75 |
+
|
| 76 |
+
| Component | Quantization | Why |
|
| 77 |
+
|---|---|---|
|
| 78 |
+
| `embed_tokens` (151,936 × 2,048) | ✅ 8-bit | Big, tolerant |
|
| 79 |
+
| `lm_head` (151,936 × 2,048) | ✅ 8-bit | Big, used in AR decode only |
|
| 80 |
+
| 32 layers × `q/k/v/o_proj` (UND) | ✅ 8-bit | Bulk of LLM compute |
|
| 81 |
+
| 32 layers × `q/k/v/o_proj_moe_gen` (GEN) | ✅ 8-bit | Bulk of GEN compute |
|
| 82 |
+
| 32 layers × `mlp.{up,gate,down}_proj` | ✅ 8-bit | Bulk of LLM compute |
|
| 83 |
+
| 32 layers × `mlp_moe_gen.{up,gate,down}` | ✅ 8-bit | Bulk of GEN compute |
|
| 84 |
+
| `time_embedder.proj_in/out` | ❌ bf16 | Timestep info, numerically sensitive |
|
| 85 |
+
| `llm2vae` (flow head, 2048 × 48) | ❌ bf16 | Tiny + critical to flow prediction |
|
| 86 |
+
| `vae_in_proj.vae2llm` (2048 × 48) | ❌ bf16 | Auto-skipped (input_dim 48 ≠ 64*k) |
|
| 87 |
+
| `latent_pos_embed.pos_embed` | ❌ bf16 | Custom param holder, no `to_quantized` |
|
| 88 |
+
| All RMSNorms + QK-norms | ❌ bf16 | F32 / bf16 norm scales preserved |
|
| 89 |
+
| Wan2.2 VAE (encoder + decoder) | ❌ bf16 | Pixel fidelity matters |
|
| 90 |
+
| Qwen2.5-VL ViT | ❌ bf16 | Semantic fidelity matters for x2t |
|
| 91 |
+
|
| 92 |
+
Recipe: 8-bit affine, group_size 64. `quantization_report.json` in this repo has full provenance.
|
| 93 |
+
|
| 94 |
+
## Why no Video 8-bit yet
|
| 95 |
+
|
| 96 |
+
The video specialist (`Lance_3B_Video`) does **not** quantize cleanly to 8-bit with this recipe — t2v output collapses to a gray gradient regardless of whether the GEN tower is included or skipped, and finer group_sizes don't help. The video-specialist fine-tune has different weight distributions that affine 8-bit can't capture.
|
| 97 |
+
|
| 98 |
+
Reza2kn/lance-quant's findings suggest **DWQ (dynamic weight quantization)** with calibration is the right approach for Lance video at 8-bit and below. That's a Phase 5c project. For now, use [`mlx-community/Lance-3B-Video-bf16`](https://huggingface.co/mlx-community/Lance-3B-Video-bf16) at bf16 for video tasks.
|
| 99 |
+
|
| 100 |
+
## Files in this repo
|
| 101 |
+
|
| 102 |
+
| File | Size | Notes |
|
| 103 |
+
|---|---|---|
|
| 104 |
+
| `model.safetensors` | 6.59 GB | Quantized LLM weights (2033 tensors: each Linear becomes weight + scales + biases) |
|
| 105 |
+
| `vit.safetensors` | 1.34 GB | bf16 (not quantized) |
|
| 106 |
+
| `vae.safetensors` | 1.41 GB | bf16 (not quantized) |
|
| 107 |
+
| `config.json` | – | With `quantization` block (`bits=8, group_size=64, mode=affine`) |
|
| 108 |
+
| `quantization_report.json` | – | Provenance + footprint stats |
|
| 109 |
+
| `tokenizer.json` / `vocab.json` | – | Qwen2.5-VL vocabulary |
|
| 110 |
+
|
| 111 |
+
## Architecture (same as the bf16 variant)
|
| 112 |
+
|
| 113 |
+
See [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16) for the full architecture description.
|
| 114 |
+
|
| 115 |
+
## License
|
| 116 |
+
|
| 117 |
+
This MLX port + quantization: **Apache 2.0**.
|
| 118 |
+
|
| 119 |
+
Underlying weights:
|
| 120 |
+
- Lance: Apache 2.0 (ByteDance Intelligent Creation Lab).
|
| 121 |
+
- Wan2.2 VAE: Apache 2.0 (Alibaba).
|
| 122 |
+
- Qwen2.5-VL: Apache 2.0 (Alibaba).
|
| 123 |
+
|
| 124 |
+
## Citation
|
| 125 |
+
|
| 126 |
+
```bibtex
|
| 127 |
+
@article{fu2026lance,
|
| 128 |
+
title={Lance: Unified Multimodal Modeling by Multi-Task Synergy},
|
| 129 |
+
author={Fu, Fengyi and Huang, Mengqi and Wu, Shaojin and others},
|
| 130 |
+
journal={arXiv preprint arXiv:2605.18678},
|
| 131 |
+
year={2026}
|
| 132 |
+
}
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
## Links
|
| 136 |
+
|
| 137 |
+
- **MLX port code:** [`github.com/xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx)
|
| 138 |
+
- **bf16 source:** [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16)
|
| 139 |
+
- **Standalone VAE:** [`mlx-community/Wan2.2-VAE-Lance-bf16`](https://huggingface.co/mlx-community/Wan2.2-VAE-Lance-bf16)
|
| 140 |
+
- **Video specialist (bf16, alpha 8-bit pending):** [`mlx-community/Lance-3B-Video-bf16`](https://huggingface.co/mlx-community/Lance-3B-Video-bf16)
|
config.json
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen2_5_VLForConditionalGeneration"
|
| 4 |
+
],
|
| 5 |
+
"attention_dropout": 0.0,
|
| 6 |
+
"bos_token_id": 151643,
|
| 7 |
+
"eos_token_id": 151645,
|
| 8 |
+
"vision_start_token_id": 151652,
|
| 9 |
+
"vision_end_token_id": 151653,
|
| 10 |
+
"vision_token_id": 151654,
|
| 11 |
+
"image_token_id": 151655,
|
| 12 |
+
"video_token_id": 151656,
|
| 13 |
+
"hidden_act": "silu",
|
| 14 |
+
"hidden_size": 2048,
|
| 15 |
+
"initializer_range": 0.02,
|
| 16 |
+
"intermediate_size": 11008,
|
| 17 |
+
"max_position_embeddings": 128000,
|
| 18 |
+
"max_window_layers": 70,
|
| 19 |
+
"model_type": "qwen2_5_vl",
|
| 20 |
+
"num_attention_heads": 16,
|
| 21 |
+
"num_hidden_layers": 36,
|
| 22 |
+
"num_key_value_heads": 2,
|
| 23 |
+
"rms_norm_eps": 1e-06,
|
| 24 |
+
"rope_theta": 1000000.0,
|
| 25 |
+
"sliding_window": 32768,
|
| 26 |
+
"tie_word_embeddings": false,
|
| 27 |
+
"torch_dtype": "bfloat16",
|
| 28 |
+
"transformers_version": "4.41.2",
|
| 29 |
+
"use_cache": true,
|
| 30 |
+
"use_sliding_window": false,
|
| 31 |
+
"vision_config": {
|
| 32 |
+
"depth": 32,
|
| 33 |
+
"hidden_act": "silu",
|
| 34 |
+
"hidden_size": 1280,
|
| 35 |
+
"intermediate_size": 3420,
|
| 36 |
+
"num_heads": 16,
|
| 37 |
+
"in_chans": 3,
|
| 38 |
+
"out_hidden_size": 2048,
|
| 39 |
+
"patch_size": 14,
|
| 40 |
+
"spatial_merge_size": 2,
|
| 41 |
+
"spatial_patch_size": 14,
|
| 42 |
+
"window_size": 112,
|
| 43 |
+
"fullatt_block_indexes": [
|
| 44 |
+
7,
|
| 45 |
+
15,
|
| 46 |
+
23,
|
| 47 |
+
31
|
| 48 |
+
],
|
| 49 |
+
"tokens_per_second": 2,
|
| 50 |
+
"temporal_patch_size": 2
|
| 51 |
+
},
|
| 52 |
+
"rope_scaling": {
|
| 53 |
+
"type": "mrope",
|
| 54 |
+
"mrope_section": [
|
| 55 |
+
16,
|
| 56 |
+
24,
|
| 57 |
+
24
|
| 58 |
+
]
|
| 59 |
+
},
|
| 60 |
+
"vocab_size": 151936,
|
| 61 |
+
"quantization": {
|
| 62 |
+
"group_size": 64,
|
| 63 |
+
"bits": 8,
|
| 64 |
+
"mode": "affine"
|
| 65 |
+
},
|
| 66 |
+
"quantization_config": {
|
| 67 |
+
"group_size": 64,
|
| 68 |
+
"bits": 8,
|
| 69 |
+
"mode": "affine"
|
| 70 |
+
}
|
| 71 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 151643,
|
| 3 |
+
"pad_token_id": 151643,
|
| 4 |
+
"do_sample": true,
|
| 5 |
+
"eos_token_id": [
|
| 6 |
+
151645,
|
| 7 |
+
151643
|
| 8 |
+
],
|
| 9 |
+
"repetition_penalty": 1.05,
|
| 10 |
+
"temperature": 0.000001,
|
| 11 |
+
"transformers_version": "4.49.0"
|
| 12 |
+
}
|
llm_config.json
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen2_5_VLForConditionalGeneration"
|
| 4 |
+
],
|
| 5 |
+
"attention_dropout": 0.0,
|
| 6 |
+
"bos_token_id": 151643,
|
| 7 |
+
"eos_token_id": 151645,
|
| 8 |
+
"vision_start_token_id": 151652,
|
| 9 |
+
"vision_end_token_id": 151653,
|
| 10 |
+
"vision_token_id": 151654,
|
| 11 |
+
"image_token_id": 151655,
|
| 12 |
+
"video_token_id": 151656,
|
| 13 |
+
"hidden_act": "silu",
|
| 14 |
+
"hidden_size": 2048,
|
| 15 |
+
"initializer_range": 0.02,
|
| 16 |
+
"intermediate_size": 11008,
|
| 17 |
+
"max_position_embeddings": 128000,
|
| 18 |
+
"max_window_layers": 70,
|
| 19 |
+
"model_type": "qwen2_5_vl",
|
| 20 |
+
"num_attention_heads": 16,
|
| 21 |
+
"num_hidden_layers": 36,
|
| 22 |
+
"num_key_value_heads": 2,
|
| 23 |
+
"rms_norm_eps": 1e-06,
|
| 24 |
+
"rope_theta": 1000000.0,
|
| 25 |
+
"sliding_window": 32768,
|
| 26 |
+
"tie_word_embeddings": true,
|
| 27 |
+
"torch_dtype": "bfloat16",
|
| 28 |
+
"transformers_version": "4.41.2",
|
| 29 |
+
"use_cache": true,
|
| 30 |
+
"use_sliding_window": false,
|
| 31 |
+
"vision_config": {
|
| 32 |
+
"depth": 32,
|
| 33 |
+
"hidden_act": "silu",
|
| 34 |
+
"hidden_size": 1280,
|
| 35 |
+
"intermediate_size": 3420,
|
| 36 |
+
"num_heads": 16,
|
| 37 |
+
"in_chans": 3,
|
| 38 |
+
"out_hidden_size": 2048,
|
| 39 |
+
"patch_size": 14,
|
| 40 |
+
"spatial_merge_size": 2,
|
| 41 |
+
"spatial_patch_size": 14,
|
| 42 |
+
"window_size": 112,
|
| 43 |
+
"fullatt_block_indexes": [
|
| 44 |
+
7,
|
| 45 |
+
15,
|
| 46 |
+
23,
|
| 47 |
+
31
|
| 48 |
+
],
|
| 49 |
+
"tokens_per_second": 2,
|
| 50 |
+
"temporal_patch_size": 2
|
| 51 |
+
},
|
| 52 |
+
"rope_scaling": {
|
| 53 |
+
"type": "mrope",
|
| 54 |
+
"mrope_section": [
|
| 55 |
+
16,
|
| 56 |
+
24,
|
| 57 |
+
24
|
| 58 |
+
]
|
| 59 |
+
},
|
| 60 |
+
"vocab_size": 151936
|
| 61 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3cf99c5fb64a6663b5cc04eea73e78ea8920a1e16fcecabc509a3da335d3c072
|
| 3 |
+
size 6585590531
|
quantization_report.json
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"source_dir": "/Volumes/DEV_VOL1/VideoResearch/lance-mlx-models/Lance-3B-bf16",
|
| 3 |
+
"bits": 8,
|
| 4 |
+
"group_size": 64,
|
| 5 |
+
"mode": "affine",
|
| 6 |
+
"bf16_bytes": 12371046496,
|
| 7 |
+
"quantized_bytes": 6585364576,
|
| 8 |
+
"compression_ratio": 0.5323207360128573,
|
| 9 |
+
"n_tensors_bf16": 1021,
|
| 10 |
+
"n_tensors_quant": 2033,
|
| 11 |
+
"skip_patterns": [
|
| 12 |
+
"time_embedder.proj_in",
|
| 13 |
+
"time_embedder.proj_out",
|
| 14 |
+
"llm2vae"
|
| 15 |
+
]
|
| 16 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
vae.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:707e20bb83bdffff77774e04275d64b5ee8660f98390ce362538078d020b6807
|
| 3 |
+
size 1409401642
|
vit.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4abfe7f4b7a22d2119a11ff678f6dbc8ff310d6a10f4a0e019ce87ae3c1721ee
|
| 3 |
+
size 1337407631
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|