brandonbeiler commited on
Commit
8cd511d
Β·
verified Β·
1 Parent(s): 6c37ff7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -15
README.md CHANGED
@@ -14,24 +14,13 @@ tags:
14
  pipeline_tag: image-text-to-text
15
  inference: false
16
  license: mit
 
 
17
  ---
18
  # πŸ”₯ InternVL3-78B-FP8-Dynamic: Optimized Vision-Language Model πŸ”₯
19
  This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-78B](https://huggingface.co/OpenGVLab/InternVL3-78B), optimized for high-performance inference with vLLM.
20
  The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
21
- ## πŸš€ Key Features
22
- - **FP8 Dynamic Quantization**: No calibration required, ready to use immediately
23
- - **Vision-Language Optimized**: Specialized quantization recipe that preserves visual understanding
24
- - **vLLM Ready**: Seamless integration with vLLM for production deployment
25
- - **Memory Efficient**: ~50% memory reduction compared to FP16 original
26
- - **Performance Boost**: Significant faster inference on H100/L40S GPUs
27
- - **Easy Deployment**: No calibration dataset needed for quantization
28
- ## πŸ“Š Model Details
29
- - **Original Model**: [OpenGVLab/InternVL3-78B](https://huggingface.co/OpenGVLab/InternVL3-78B)
30
- - **Source Model**: OpenGVLab/InternVL3-78B
31
- - **Quantized Model**: InternVL3-78B-FP8-Dynamic
32
- - **Quantization Method**: FP8 Dynamic (W8A8)
33
- - **Quantization Library**: [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.5.2.dev110+gf6010ce1
34
- - **Quantized by**: [brandonbeiler](https://huggingface.co/brandonbeiler)
35
  ## πŸ”§ Usage
36
  ### With vLLM (Recommended)
37
  ```python
@@ -50,6 +39,20 @@ response = model.generate("Describe this image: <image>", sampling_params)
50
  print(response[0].outputs[0].text)
51
  ```
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ## πŸ—οΈ Technical Specifications
54
  ### Hardware Requirements
55
  - **Inference**: 83 VRAM (+ VRAM for context)
@@ -69,4 +72,4 @@ torch==2.7.0
69
  vllm==0.9.1
70
  ```
71
 
72
- *Quantized with ❀️ using LLM Compressor for the open-source community*
 
14
  pipeline_tag: image-text-to-text
15
  inference: false
16
  license: mit
17
+ base_model:
18
+ - OpenGVLab/InternVL3-78B
19
  ---
20
  # πŸ”₯ InternVL3-78B-FP8-Dynamic: Optimized Vision-Language Model πŸ”₯
21
  This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-78B](https://huggingface.co/OpenGVLab/InternVL3-78B), optimized for high-performance inference with vLLM.
22
  The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
23
+
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## πŸ”§ Usage
25
  ### With vLLM (Recommended)
26
  ```python
 
39
  print(response[0].outputs[0].text)
40
  ```
41
 
42
+ ## πŸš€ Key Features
43
+ - **FP8 Dynamic Quantization**
44
+ - **Vision-Language Optimized**: Specialized quantization recipe that preserves visual understanding
45
+ - **vLLM Ready**: Seamless integration with vLLM for production deployment
46
+ - **Memory Efficient**: ~50% memory reduction compared to FP16 original
47
+ - **Performance Boost**: Significant faster inference on H100/L40S GPUs
48
+ ## πŸ“Š Model Details
49
+ - **Original Model**: [OpenGVLab/InternVL3-78B](https://huggingface.co/OpenGVLab/InternVL3-78B)
50
+ - **Source Model**: OpenGVLab/InternVL3-78B
51
+ - **Quantized Model**: InternVL3-78B-FP8-Dynamic
52
+ - **Quantization Method**: FP8 Dynamic (W8A8)
53
+ - **Quantization Library**: [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.5.2.dev110+gf6010ce1
54
+ - **Quantized by**: [brandonbeiler](https://huggingface.co/brandonbeiler)
55
+
56
  ## πŸ—οΈ Technical Specifications
57
  ### Hardware Requirements
58
  - **Inference**: 83 VRAM (+ VRAM for context)
 
72
  vllm==0.9.1
73
  ```
74
 
75
+ *Quantized with ❀️ using LLM Compressor for the open-source community*