varad-simpli's picture
Update README.md
3c92457 verified
---
tags:
- fp8
- fp8-dynamic
- internvl3.5
- internvl
language:
- multilingual
pipeline_tag: image-text-to-text
inference: false
license: mit
base_model: OpenGVLab/InternVL3_5-38B
base_model_relation: quantized
---
# InternVL3.5 38B FP8
This is an FP8 dynamically quantized (W8A8) version of `OpenGVLab/InternVL3_5-38B`optimized for high-performance inference.
The quantization process uses a specialized recipe that preserves the model's core visual understanding capabilities while reducing the memory footprint by nearly 40%.
**Notes**
- 32k max context length
- reasoning parser ready to go, requires system prompt to run in thinking mode
- still investigating tool calling
## Model Details
| Attribute | Value |
| :--- | :--- |
| **Original Model** | [OpenGVLab/InternVL3_5-38B](https://huggingface.co/OpenGVLab/InternVL3_5-38B) |
| **Quantization Method** | FP8 Dynamic (W8A8) |
## Technical Specifications
### Quantization Details
* **Weights:** FP8 E4M3 with per-tensor scales.
* **Activations:** Dynamically quantized to FP8 E4M3 with per-tensor scales.
* **Preserved Modules (Full Precision):** Vision tower, embeddings, and the first MLP layer (mlp1).