Simplismart
/

InternVL3_5-38B-FP8-Dynamic

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions

InternVL3_5-38B-FP8-Dynamic / README.md

varad-simpli's picture

Update README.md

3c92457 verified 7 days ago

|

history blame contribute delete

1.18 kB

	---
	tags:
	- fp8
	- fp8-dynamic
	- internvl3.5
	- internvl
	language:
	- multilingual
	pipeline_tag: image-text-to-text
	inference: false
	license: mit
	base_model: OpenGVLab/InternVL3_5-38B
	base_model_relation: quantized
	---

	# InternVL3.5 38B FP8

	This is an FP8 dynamically quantized (W8A8) version of `OpenGVLab/InternVL3_5-38B`optimized for high-performance inference.

	The quantization process uses a specialized recipe that preserves the model's core visual understanding capabilities while reducing the memory footprint by nearly 40%.


	Notes
	- 32k max context length
	- reasoning parser ready to go, requires system prompt to run in thinking mode
	- still investigating tool calling

	## Model Details

	\| Attribute \| Value \|
	\| :--- \| :--- \|
	\| Original Model \| [OpenGVLab/InternVL3_5-38B](https://huggingface.co/OpenGVLab/InternVL3_5-38B) \|
	\| Quantization Method \| FP8 Dynamic (W8A8) \|

	## Technical Specifications

	### Quantization Details

	* Weights: FP8 E4M3 with per-tensor scales.
	* Activations: Dynamically quantized to FP8 E4M3 with per-tensor scales.
	* Preserved Modules (Full Precision): Vision tower, embeddings, and the first MLP layer (mlp1).