88plug-bot commited on
Commit
c721cee
·
verified ·
1 Parent(s): 24928ee

Update model card

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -13,13 +13,16 @@ tags:
13
  - tts
14
  - quantized
15
  - int4
 
16
  - w4a16
17
  - 4-bit
18
  - compressed-tensors
19
  - vllm
20
  - text-generation
 
21
  - ptq
22
  - autoround
 
23
  pipeline_tag: text-generation
24
  library_name: transformers
25
  model_type: minicpmo
@@ -63,6 +66,8 @@ Note: The non-quantized modal encoders (SigLIP2 ~1 GB, Whisper ~390 MB, CosyVoic
63
 
64
  ## Quick Start
65
 
 
 
66
  ### vLLM — text output
67
 
68
  ```bash
@@ -238,4 +243,14 @@ llama-server \
238
 
239
  ## About
240
 
241
- Produced by [88plug AI Lab](https://huggingface.co/88plug) — zero-loss quantizations of frontier omni and voice models.
 
 
 
 
 
 
 
 
 
 
 
13
  - tts
14
  - quantized
15
  - int4
16
+ - INT4
17
  - w4a16
18
  - 4-bit
19
  - compressed-tensors
20
  - vllm
21
  - text-generation
22
+ - conversational
23
  - ptq
24
  - autoround
25
+ - llmcompressor
26
  pipeline_tag: text-generation
27
  library_name: transformers
28
  model_type: minicpmo
 
66
 
67
  ## Quick Start
68
 
69
+ Tested with **vLLM v0.21.0** (`vllm/vllm-openai:v0.21.0-cu129-ubuntu2404`). Weights are in **compressed-tensors** format — vLLM detects and loads quantization automatically. No `--quantization` flag needed.
70
+
71
  ### vLLM — text output
72
 
73
  ```bash
 
243
 
244
  ## About
245
 
246
+ [**88plug AI Lab**](https://huggingface.co/88plug) produces production-grade compressed-tensors quantizations of frontier LLMs, VLMs, and omni models — built for native vLLM v0.21.0+ deployment with zero extra flags.
247
+
248
+ **W8A16** — INT8 weights + BF16 activations. Near-lossless on any Ampere+ GPU. Runs where FP8 hardware cannot.
249
+
250
+ **W4A16** — AutoRound with iters=200 and a mixed calibration corpus. Targets ≥ 99% MMLU recovery — the quality bar that makes W4A16 viable for production.
251
+
252
+ All weights are in compressed-tensors format. vLLM detects quantization automatically from `quantization_config` in `config.json`. No `--quantization` flag required.
253
+
254
+ **Also available:** [MiniCPM-o-4.5-W8A16 (INT8, ~9 GB)](https://huggingface.co/88plug/MiniCPM-o-4.5-W8A16) · [MiniCPM-o-4.5-W4A16 (INT4, ~4–5 GB)](https://huggingface.co/88plug/MiniCPM-o-4.5-W4A16)
255
+
256
+ Browse all releases → [huggingface.co/88plug](https://huggingface.co/88plug)