|
|
--- |
|
|
tags: |
|
|
- fp8 |
|
|
- fp8-dynamic |
|
|
- internvl3.5 |
|
|
- internvl |
|
|
language: |
|
|
- multilingual |
|
|
pipeline_tag: image-text-to-text |
|
|
inference: false |
|
|
license: mit |
|
|
base_model: OpenGVLab/InternVL3_5-38B |
|
|
base_model_relation: quantized |
|
|
--- |
|
|
|
|
|
# InternVL3.5 38B FP8 |
|
|
|
|
|
This is an FP8 dynamically quantized (W8A8) version of `OpenGVLab/InternVL3_5-38B`optimized for high-performance inference. |
|
|
|
|
|
The quantization process uses a specialized recipe that preserves the model's core visual understanding capabilities while reducing the memory footprint by nearly 40%. |
|
|
|
|
|
|
|
|
**Notes** |
|
|
- 32k max context length |
|
|
- reasoning parser ready to go, requires system prompt to run in thinking mode |
|
|
- still investigating tool calling |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Attribute | Value | |
|
|
| :--- | :--- | |
|
|
| **Original Model** | [OpenGVLab/InternVL3_5-38B](https://huggingface.co/OpenGVLab/InternVL3_5-38B) | |
|
|
| **Quantization Method** | FP8 Dynamic (W8A8) | |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Quantization Details |
|
|
|
|
|
* **Weights:** FP8 E4M3 with per-tensor scales. |
|
|
* **Activations:** Dynamically quantized to FP8 E4M3 with per-tensor scales. |
|
|
* **Preserved Modules (Full Precision):** Vision tower, embeddings, and the first MLP layer (mlp1). |
|
|
|
|
|
|