HIGGS-per-tensor
updated
meta-llama/Llama-3.2-1B-Instruct
Text Generation
• 1B • Updated • 6.48M
• • 1.4k
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 47
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 65
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 34
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-hybrid-per-tensor
1B • Updated • 38
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-noise-per-tensor
1B • Updated • 35
inference-optimization/Llama-3.2-1B-Instruct-5.5-bits-mode-heuristic-per-tensor
1B • Updated • 37
inference-optimization/Llama-3.2-1B-Instruct-5.5-bits-mode-hybrid-per-tensor
1B • Updated • 35
inference-optimization/Llama-3.2-1B-Instruct-5.5-bits-mode-noise-per-tensor
1B • Updated • 35
inference-optimization/Llama-3.2-1B-Instruct-6-bits-mode-heuristic-per-tensor
1B • Updated • 35
inference-optimization/Llama-3.2-1B-Instruct-6-bits-mode-hybrid-per-tensor
1B • Updated • 34
inference-optimization/Llama-3.2-1B-Instruct-6-bits-mode-noise-per-tensor
1B • Updated • 40
inference-optimization/Llama-3.2-1B-Instruct-6.5-bits-mode-heuristic-per-tensor
1B • Updated • 35
inference-optimization/Llama-3.2-1B-Instruct-6.5-bits-mode-hybrid-per-tensor
1B • Updated • 35
inference-optimization/Llama-3.2-1B-Instruct-6.5-bits-mode-noise-per-tensor
1B • Updated • 34
inference-optimization/Llama-3.2-1B-Instruct-7-bits-mode-heuristic-per-tensor
1B • Updated • 34
inference-optimization/Llama-3.2-1B-Instruct-7-bits-mode-hybrid-per-tensor
1B • Updated • 36
inference-optimization/Llama-3.2-1B-Instruct-7-bits-mode-noise-per-tensor
1B • Updated • 35
meta-llama/Llama-3.2-3B-Instruct
Text Generation
• 3B • Updated • 2.27M
• • 2.12k
inference-optimization/Llama-3.2-3B-Instruct-FP8-Dynamic
3B • Updated • 40
inference-optimization/Llama-3.2-3B-Instruct-NVFP4
2B • Updated • 308
inference-optimization/Llama-3.2-3B-Instruct-5-bits-mode-heuristic-per-tensor
3B • Updated • 32
inference-optimization/Llama-3.2-3B-Instruct-5-bits-mode-hybrid-per-tensor
3B • Updated • 34
inference-optimization/Llama-3.2-3B-Instruct-5-bits-mode-noise-per-tensor
3B • Updated • 34
inference-optimization/Llama-3.2-3B-Instruct-5.5-bits-mode-heuristic-per-tensor
3B • Updated • 35
inference-optimization/Llama-3.2-3B-Instruct-5.5-bits-mode-hybrid-per-tensor
3B • Updated • 34
inference-optimization/Llama-3.2-3B-Instruct-5.5-bits-mode-noise-per-tensor
3B • Updated • 38
inference-optimization/Llama-3.2-3B-Instruct-6-bits-mode-heuristic-per-tensor
3B • Updated • 65
inference-optimization/Llama-3.2-3B-Instruct-6-bits-mode-hybrid-per-tensor
3B • Updated • 46
inference-optimization/Llama-3.2-3B-Instruct-6-bits-mode-noise-per-tensor
3B • Updated • 39
inference-optimization/Llama-3.2-3B-Instruct-6.5-bits-mode-heuristic-per-tensor
3B • Updated • 34
inference-optimization/Llama-3.2-3B-Instruct-6.5-bits-mode-hybrid-per-tensor
3B • Updated • 37
inference-optimization/Llama-3.2-3B-Instruct-6.5-bits-mode-noise-per-tensor
3B • Updated • 32
inference-optimization/Llama-3.2-3B-Instruct-7-bits-mode-heuristic-per-tensor
3B • Updated • 37
inference-optimization/Llama-3.2-3B-Instruct-7-bits-mode-hybrid-per-tensor
3B • Updated • 32
inference-optimization/Llama-3.2-3B-Instruct-7-bits-mode-noise-per-tensor
3B • Updated • 35
meta-llama/Llama-3.1-8B-Instruct
Text Generation
• 8B • Updated • 9.68M
• • 5.79k
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation
• 8B • Updated • 62.1k
• 9
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation
• 5B • Updated • 19.3k
• 1
inference-optimization/Llama-3.1-8B-Instruct-5-bits-mode-heuristic-per-tensor
5B • Updated • 43
inference-optimization/Llama-3.1-8B-Instruct-5-bits-mode-hybrid-per-tensor
5B • Updated • 38
inference-optimization/Llama-3.1-8B-Instruct-5-bits-mode-noise-per-tensor
5B • Updated • 35
inference-optimization/Llama-3.1-8B-Instruct-5.5-bits-mode-heuristic-per-tensor
6B • Updated • 40
inference-optimization/Llama-3.1-8B-Instruct-5.5-bits-mode-hybrid-per-tensor
6B • Updated • 45
inference-optimization/Llama-3.1-8B-Instruct-5.5-bits-mode-noise-per-tensor
6B • Updated • 40
inference-optimization/Llama-3.1-8B-Instruct-6-bits-mode-heuristic-per-tensor
6B • Updated • 57
inference-optimization/Llama-3.1-8B-Instruct-6-bits-mode-hybrid-per-tensor
6B • Updated • 51
inference-optimization/Llama-3.1-8B-Instruct-6-bits-mode-noise-per-tensor
6B • Updated • 48
inference-optimization/Llama-3.1-8B-Instruct-6.5-bits-mode-heuristic-per-tensor
7B • Updated • 42
inference-optimization/Llama-3.1-8B-Instruct-6.5-bits-mode-hybrid-per-tensor
7B • Updated • 52
inference-optimization/Llama-3.1-8B-Instruct-6.5-bits-mode-noise-per-tensor
7B • Updated • 37
inference-optimization/Llama-3.1-8B-Instruct-7-bits-mode-heuristic-per-tensor
7B • Updated • 44
inference-optimization/Llama-3.1-8B-Instruct-7-bits-mode-hybrid-per-tensor
7B • Updated • 38
inference-optimization/Llama-3.1-8B-Instruct-7-bits-mode-noise-per-tensor
7B • Updated • 41
Text Generation
• 8B • Updated • 10.8M
• • 1.07k
RedHatAI/Qwen3-8B-FP8-dynamic
Text Generation
• 8B • Updated • 30.1k
• 12
Text Generation
• 5B • Updated • 3.3k
• 2
inference-optimization/Qwen3-8B-5-bits-mode-heuristic-per-tensor
6B • Updated • 54
inference-optimization/Qwen3-8B-5-bits-mode-hybrid-per-tensor
6B • Updated • 59
inference-optimization/Qwen3-8B-5-bits-mode-noise-per-tensor
6B • Updated • 51
inference-optimization/Qwen3-8B-5.5-bits-mode-heuristic-per-tensor
6B • Updated • 54
inference-optimization/Qwen3-8B-5.5-bits-mode-hybrid-per-tensor
6B • Updated • 53
inference-optimization/Qwen3-8B-5.5-bits-mode-noise-per-tensor
6B • Updated • 53
inference-optimization/Qwen3-8B-6-bits-mode-heuristic-per-tensor
6B • Updated • 54
inference-optimization/Qwen3-8B-6-bits-mode-hybrid-per-tensor
6B • Updated • 50
inference-optimization/Qwen3-8B-6-bits-mode-noise-per-tensor
6B • Updated • 51
inference-optimization/Qwen3-8B-6.5-bits-mode-heuristic-per-tensor
7B • Updated • 61
inference-optimization/Qwen3-8B-6.5-bits-mode-hybrid-per-tensor
7B • Updated • 51
inference-optimization/Qwen3-8B-6.5-bits-mode-noise-per-tensor
6B • Updated • 54
inference-optimization/Qwen3-8B-7-bits-mode-heuristic-per-tensor
7B • Updated • 78
inference-optimization/Qwen3-8B-7-bits-mode-hybrid-per-tensor
7B • Updated • 76
inference-optimization/Qwen3-8B-7-bits-mode-noise-per-tensor
6B • Updated • 64
Text Generation
• Updated • 1.57M
• • 884
RedHatAI/Qwen3-30B-A3B-FP8-dynamic
Text Generation
• 31B • Updated • 3.79k
• 3
RedHatAI/Qwen3-30B-A3B-NVFP4
Text Generation
• 17B • Updated • 23k
• 2
inference-optimization/Qwen3-30B-A3B-5-bits-mode-heuristic-per-tensor
19B • Updated • 61
inference-optimization/Qwen3-30B-A3B-5-bits-mode-hybrid-per-tensor
19B • Updated • 48
inference-optimization/Qwen3-30B-A3B-5-bits-mode-noise-per-tensor
19B • Updated • 46
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-heuristic-per-tensor
21B • Updated • 42
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-hybrid-per-tensor
21B • Updated • 46
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-noise-per-tensor
21B • Updated • 48
inference-optimization/Qwen3-30B-A3B-6-bits-mode-heuristic-per-tensor
23B • Updated • 66
inference-optimization/Qwen3-30B-A3B-6-bits-mode-hybrid-per-tensor
23B • Updated • 52
inference-optimization/Qwen3-30B-A3B-6-bits-mode-noise-per-tensor
23B • Updated • 45
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-heuristic-per-tensor
25B • Updated • 49
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-hybrid-per-tensor
25B • Updated • 46
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-noise-per-tensor
25B • Updated • 48
inference-optimization/Qwen3-30B-A3B-7-bits-mode-heuristic-per-tensor
27B • Updated • 61
inference-optimization/Qwen3-30B-A3B-7-bits-mode-hybrid-per-tensor
27B • Updated • 52
inference-optimization/Qwen3-30B-A3B-7-bits-mode-noise-per-tensor
27B • Updated • 45
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation
• Updated • 1.12M
• • 805
inference-optimization/Qwen3-30B-A3B-Instruct-2507-FP8-Dynamic
inference-optimization/Qwen3-30B-A3B-Instruct-2507-NVFP4
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5-bits-mode-heuristic-per-tensor
19B • Updated • 53
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5-bits-mode-hybrid-per-tensor
19B • Updated • 43
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5-bits-mode-noise-per-tensor
19B • Updated • 45
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-heuristic-per-tensor
21B • Updated • 47
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-hybrid-per-tensor
21B • Updated • 43
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-noise-per-tensor
21B • Updated • 45
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-heuristic-per-tensor
23B • Updated • 41
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-hybrid-per-tensor
23B • Updated • 45
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-noise-per-tensor
23B • Updated • 43
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-heuristic-per-tensor
25B • Updated • 45
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-hybrid-per-tensor
25B • Updated • 42
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-noise-per-tensor
25B • Updated • 43
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-heuristic-per-tensor
27B • Updated • 42
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-hybrid-per-tensor
27B • Updated • 41
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-noise-per-tensor
26B • Updated • 43