Efficient Intelligence and Systems

community

AI & ML interests

Low-bit Quantization of Large Language Models (LLMs)

Recent Activity

AaronHuangWei submitted a paper 25 days ago

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

AaronHuangWei submitted a paper about 1 month ago

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Xingyu-Zheng authored a paper 2 months ago

First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

View all activity

Efficient-ML 's models 52

Efficient-ML/Qwen3-4B-base-gptq-w8-perchannel

Updated May 5, 2025

Efficient-ML/Qwen3-4B-base-gptq-w8-128

Updated May 5, 2025

Efficient-ML/Qwen3-4B-base-gptq-w4-perchannel

Updated May 5, 2025

Efficient-ML/Qwen3-4B-base-gptq-w4-128

Updated May 5, 2025

Efficient-ML/Qwen3-1.7B-base-gptq-w8-perchannel

Updated May 5, 2025

Efficient-ML/Qwen3-1.7B-base-gptq-w8-128

Updated May 5, 2025

Efficient-ML/Qwen3-1.7B-base-gptq-w4-perchannel

Updated May 5, 2025

Efficient-ML/Qwen3-1.7B-base-gptq-w4-128

Updated May 5, 2025

Efficient-ML/Qwen3-0.6B-base-gptq-w8-128

Updated May 5, 2025

Efficient-ML/Qwen3-0.6B-base-gptq-w8-perchannel

Updated May 5, 2025

Efficient-ML/Qwen3-0.6B-base-gptq-w4-128

Updated May 5, 2025

Efficient-ML/Qwen3-0.6B-base-gptq-w4-perchannel

Updated May 5, 2025

Efficient-ML/LLaMA-3-70B-GPTQ-4bit-b128

Updated Jun 4, 2024 • 2

Efficient-ML/LLaMA-3-8B-AWQ-4bit-b128

Text Generation • Updated Apr 28, 2024 • 1

Efficient-ML/LLaMA-3-8B-DB-LLM-2bit-fake

Text Generation • Updated Apr 26, 2024 • 3 • 2

Efficient-ML/LLaMA-3-8B-QuIP-2bit

Text Generation • Updated Apr 26, 2024 • 11 • 3

Efficient-ML/LLaMA-3-8B-IR-QLoRA

Updated Apr 25, 2024 • 1

Efficient-ML/LLaMA-3-8B-SmoothQuant-4bit-4bit

Text Generation • 8B • Updated Apr 22, 2024 • 6

Efficient-ML/LLaMA-3-8B-SmoothQuant-8bit-8bit

Text Generation • 8B • Updated Apr 22, 2024 • 5

Efficient-ML/LLaMA-3-8B-PB-LLM-1.7bit-fake

Text Generation • 8B • Updated Apr 22, 2024 • 9 • 1

Efficient-ML/LLaMA-3-8B-BiLLM-1.1bit-fake

Updated Apr 21, 2024

Efficient-ML/LLaMA-3-8B-GPTQ-4bit-b128

Updated Apr 21, 2024 • 3