brandonbeiler commited on
Commit
0c193ea
Β·
verified Β·
1 Parent(s): 8c4f497

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -14
README.md CHANGED
@@ -20,20 +20,8 @@ base_model:
20
  # πŸ”₯ InternVL3-38B-FP8-Dynamic: Optimized Vision-Language Model πŸ”₯
21
  This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B), optimized for high-performance inference with vLLM.
22
  The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
23
- ## πŸš€ Key Features
24
- - **FP8 Dynamic Quantization**
25
- - **Vision-Language Optimized**: Specialized quantization recipe that preserves visual understanding
26
- - **vLLM Ready**: Seamless integration with vLLM for production deployment
27
- - **Memory Efficient**: ~50% memory reduction compared to FP16 original
28
- - **Performance Boost**: Significant faster inference on H100/L40S GPUs
29
- ## πŸ“Š Model Details
30
- - **Original Model**: [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B)
31
- - **Source Model**: OpenGVLab/InternVL3-38B
32
- - **Quantized Model**: InternVL3-38B-FP8-Dynamic
33
- - **Quantization Method**: FP8 Dynamic (W8A8)
34
- - **Quantization Library**: [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.5.2.dev112+g6800f811
35
- - **Quantized by**: [brandonbeiler](https://huggingface.co/brandonbeiler)
36
- ## πŸ”§ Usage
37
  ### With vLLM (Recommended)
38
  ```python
39
  from vllm import LLM, SamplingParams
@@ -51,6 +39,20 @@ response = model.generate("Describe this image: <image>", sampling_params)
51
  print(response[0].outputs[0].text)
52
  ```
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ## πŸ—οΈ Technical Specifications
55
  ### Hardware Requirements
56
  - **Inference**: 47GB VRAM (+ Context)
 
20
  # πŸ”₯ InternVL3-38B-FP8-Dynamic: Optimized Vision-Language Model πŸ”₯
21
  This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B), optimized for high-performance inference with vLLM.
22
  The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
23
+
24
+ ## Just Run it!
 
 
 
 
 
 
 
 
 
 
 
 
25
  ### With vLLM (Recommended)
26
  ```python
27
  from vllm import LLM, SamplingParams
 
39
  print(response[0].outputs[0].text)
40
  ```
41
 
42
+ ## πŸš€ Key Features
43
+ - **FP8 Dynamic Quantization**
44
+ - **Vision-Language Optimized**: Specialized quantization recipe that preserves visual understanding
45
+ - **vLLM Ready**: Seamless integration with vLLM for production deployment
46
+ - **Memory Efficient**: ~50% memory reduction compared to FP16 original
47
+ - **Performance Boost**: Significant faster inference on H100/L40S GPUs
48
+ ## πŸ“Š Model Details
49
+ - **Original Model**: [OpenGVLab/InternVL3-38B](https://huggingface.co/OpenGVLab/InternVL3-38B)
50
+ - **Source Model**: OpenGVLab/InternVL3-38B
51
+ - **Quantized Model**: InternVL3-38B-FP8-Dynamic
52
+ - **Quantization Method**: FP8 Dynamic (W8A8)
53
+ - **Quantization Library**: [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.5.2.dev112+g6800f811
54
+ - **Quantized by**: [brandonbeiler](https://huggingface.co/brandonbeiler)
55
+
56
  ## πŸ—οΈ Technical Specifications
57
  ### Hardware Requirements
58
  - **Inference**: 47GB VRAM (+ Context)