Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
RedHatAI
/
quantization
like
6
Follow
Red Hat AI
2.03k
kernel
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
3
d26f884
quantization
1.38 GB
Ctrl+K
Ctrl+K
2 contributors
History:
36 commits
danieldk
HF Staff
Sync on vLLM 20240402
d26f884
11 months ago
build
Build (aarch64)
11 months ago
compressed_tensors
Sync with vLLM
about 1 year ago
core
Sync with vLLM
about 1 year ago
cutlass_extensions
Sync on vLLM 20240402
11 months ago
cutlass_w8a8
Sync on vLLM 20240402
11 months ago
fp8
Sync with vLLM
about 1 year ago
gptq_marlin
Sync with vLLM
about 1 year ago
marlin
Add full Marlin support and tests for Marlin/CUTLASS
over 1 year ago
tests
Add full Marlin support and tests for Marlin/CUTLASS
over 1 year ago
torch-ext
Sync on vLLM 20240402
11 months ago
.gitattributes
1.56 kB
Build
over 1 year ago
LICENSE
11.4 kB
Add cutlass_w8a8
over 1 year ago
README.md
195 Bytes
Update README.md (#1)
about 1 year ago
build.toml
3.25 kB
Sync capabilities with upstream
11 months ago
cuda_utils.h
1.41 kB
Sync on vLLM 20240402
11 months ago
dispatch_utils.h
1.49 kB
Add `scaled_(int|fp8)_quant` and `fp8_marlin_gemm`
over 1 year ago
flake.lock
3.03 kB
Update flake
11 months ago
flake.nix
335 Bytes
Add support for ROCm
12 months ago
utils.cuh
1.84 kB
Sync on vLLM 20240402
11 months ago
vectorization.cuh
778 Bytes
Sync with vLLM
about 1 year ago