Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -19,12 +19,11 @@ license: mit
 This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B), optimized for high-performance inference with vLLM.
 The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
 ## 🚀 Key Features
-- **FP8 Dynamic Quantization**: No calibration required, ready to use immediately
 - **Vision-Language Optimized**: Specialized quantization recipe that preserves visual understanding
 - **vLLM Ready**: Seamless integration with vLLM for production deployment
 - **Memory Efficient**: ~50% memory reduction compared to FP16 original
 - **Performance Boost**: Significant faster inference on H100/L40S GPUs
-- **Easy Deployment**: No calibration dataset needed for quantization
 ## 📊 Model Details
 - **Original Model**: [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B)
 - **Source Model**: OpenGVLab/InternVL3-38B

 This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B), optimized for high-performance inference with vLLM.
 The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
 ## 🚀 Key Features
+- **FP8 Dynamic Quantization**
 - **Vision-Language Optimized**: Specialized quantization recipe that preserves visual understanding
 - **vLLM Ready**: Seamless integration with vLLM for production deployment
 - **Memory Efficient**: ~50% memory reduction compared to FP16 original
 - **Performance Boost**: Significant faster inference on H100/L40S GPUs
 ## 📊 Model Details
 - **Original Model**: [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B)
 - **Source Model**: OpenGVLab/InternVL3-38B