File size: 1,181 Bytes
c7da82b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c92457
c7da82b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
tags:
- fp8
- fp8-dynamic
- internvl3.5
- internvl
language:
- multilingual
pipeline_tag: image-text-to-text
inference: false
license: mit
base_model: OpenGVLab/InternVL3_5-38B
base_model_relation: quantized
---

# InternVL3.5 38B FP8

This is an FP8 dynamically quantized (W8A8) version of `OpenGVLab/InternVL3_5-38B`optimized for high-performance inference.

The quantization process uses a specialized recipe that preserves the model's core visual understanding capabilities while reducing the memory footprint by nearly 40%.


**Notes**
- 32k max context length
- reasoning parser ready to go, requires system prompt to run in thinking mode
- still investigating tool calling

## Model Details

| Attribute | Value |
| :--- | :--- |
| **Original Model** | [OpenGVLab/InternVL3_5-38B](https://huggingface.co/OpenGVLab/InternVL3_5-38B) |
| **Quantization Method** | FP8 Dynamic (W8A8) |

## Technical Specifications

### Quantization Details

*   **Weights:** FP8 E4M3 with per-tensor scales.
*   **Activations:** Dynamically quantized to FP8 E4M3 with per-tensor scales.
*   **Preserved Modules (Full Precision):** Vision tower, embeddings, and the first MLP layer (mlp1).