python3 -m mlc_llm compile --quantization q4f16_1 --output 123b_r1.so . --overrides "tensor_parallel_shards=2" --device cuda python3 -m mlc_llm chat --device cuda 123b_r1/ --model-lib /workspace/123b_r1.so
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for cgg507/Behemoth-R1-123B-v2-q4f16_1
Base model
mistralai/Mistral-Large-Instruct-2411
Finetuned
TheDrummer/Behemoth-R1-123B-v2