flashresearch/FlashResearch-DS-33k
Viewer • Updated • 33.1k • 15 • 6
EXL3 quants of cheapresearch/CheapResearch-4B-Thinking using exllamav3 for quantization.
| Quant | BPW | Head Bits |
|---|---|---|
| 2.5_H6 | 2.5 | 6 |
| 3.0_H6 | 3.0 | 6 |
| 3.5_H6 | 3.5 | 6 |
| 4.0_H6 | 4.0 | 6 |
| 4.5_H6 | 4.5 | 6 |
| 5.0_H6 | 5.0 | 6 |
| 6.0_H6 | 6.0 | 6 |
| 8.0_H8 | 8.0 | 8 |
You can download quants by targeting specific size using the Hugging Face CLI.
pip install -U "huggingface_hub[cli]"
2. Download a specific quant:
huggingface-cli download ArtusDev/cheapresearch_CheapResearch-4B-Thinking-EXL3 --revision "5.0bpw_H6" --local-dir ./
EXL3 quants can be run with any inference client that supports EXL3, such as TabbyAPI. Refer to documentation for set up instructions.
Made possible with cloud compute from lium.io
Base model
flashresearch/FlashResearch-4B-Thinking