Spaces:
Running
Running
| title: Canada Quant Labs | |
| emoji: π | |
| colorFrom: red | |
| colorTo: gray | |
| sdk: static | |
| pinned: false | |
| # Canada Quant Labs | |
| Canada's open-weight model lab. | |
| We train, quantize, and deploy sovereign AI models on Canadian Blackwell silicon β for the regulated industries that can't run on someone else's API. | |
| **What we do** | |
| - Post-training on open base models (SFT, DPO, GRPO, RLAIF) | |
| - Production quantization recipes (W4A16, NVFP4, MXFP4) | |
| - Audited, air-gapped deployment with eval evidence and MRM docs | |
| **Where we work** | |
| - Legal Β· Medical Β· Defence Β· Finance | |
| - Headquarters: Victoria, BC | |
| - Compute: NVIDIA DGX B300 at Equinix Vancouver | |
| **Upstream** | |
| - Contributors to vLLM, llm-compressor, compressed-tensors | |
| Partnerships Β· partnerships@cql.ca | |
| Press Β· press@cql.ca | |
| Web Β· [cql.ca](https://cql.ca) | |
| --- | |
| ## Open releases β DeepSeek-V4 quantization family | |
| Four artifacts in the same lineage. One base model in two sizes (V4-Flash, V4-Pro); two routed-expert formats (W4A16, NVFP4); Multi-Token Prediction (MTP) draft head retained on three of four. Attention is FP8 block 128Γ128 across all four. | |
| | Model | Base | Routed experts | MTP | On-disk | Min hardware (TP=2) | When to pick | | |
| |---|---|---|---|---|---|---| | |
| | [DeepSeek-V4-Flash-W4A16-FP8](https://huggingface.co/canada-quant/DeepSeek-V4-Flash-W4A16-FP8) | V4-Flash | W4A16 INT4 g=128 | no | ~143 GB | H200 / DGX Spark / RTX PRO 6000 | maximum compatibility, no MTP needed | | |
| | [DeepSeek-V4-Flash-W4A16-FP8-MTP](https://huggingface.co/canada-quant/DeepSeek-V4-Flash-W4A16-FP8-MTP) | V4-Flash | W4A16 INT4 g=128 | yes (BF16) | 159 GB | H200 / RTX PRO 6000 | best $/token interactive on V4-Flash | | |
| | [DeepSeek-V4-Flash-NVFP4-FP8-MTP](https://huggingface.co/canada-quant/DeepSeek-V4-Flash-NVFP4-FP8-MTP) | V4-Flash | NVFP4 g=16 | yes (BF16) | 172 GB | RTX PRO 6000 / B300 | best Blackwell-native interactive on V4-Flash | | |
| | [DeepSeek-V4-Pro-NVFP4-FP8-MTP](https://huggingface.co/canada-quant/DeepSeek-V4-Pro-NVFP4-FP8-MTP) | V4-Pro | NVFP4 g=16 | yes (byte-identical) | 913 GiB | 8Γ B300 (TP=8 + EP) | only choice for V4-Pro deployment; **+25β37% throughput vs upstream MXFP4** | | |
| Upstream reference recipes: [`RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8`](https://huggingface.co/RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8) (Flash NVFP4 topology) and [`nvidia/DeepSeek-V3.2-NVFP4`](https://huggingface.co/nvidia/DeepSeek-V3.2-NVFP4) (Pro NVFP4, MTP-exclusion topology). | |
| ### Hardware shorthand | |
| - **H200** β 8Γ NVIDIA H200 SXM5 (Hopper SM 9.0a, 141 GB HBM3e/GPU) | |
| - **DGX Spark** β 2Γ NVIDIA DGX Spark (GB10, Blackwell SM 12.1a) | |
| - **RTX PRO 6000** β NVIDIA RTX PRO 6000 Blackwell Server Edition (SM 12.0, sm_120, 96 GB HBM) | |
| - **B300** β NVIDIA B300 SXM6 AC (Blackwell SM 10.3, sm_103a, 288 GB HBM3e/GPU) | |
| ### Reproduction repos | |
| Every artifact has a public reproduction repo with calibration scripts, vLLM patches, bench harnesses, and findings docs: | |
| - [`canada-quant/dsv4-flash-w4a16-fp8`](https://github.com/canada-quant/dsv4-flash-w4a16-fp8) | |
| - [`canada-quant/dsv4-flash-w4a16-fp8-mtp`](https://github.com/canada-quant/dsv4-flash-w4a16-fp8-mtp) | |
| - [`canada-quant/dsv4-flash-nvfp4-fp8-mtp`](https://github.com/canada-quant/dsv4-flash-nvfp4-fp8-mtp) | |
| - [`canada-quant/dsv4-pro-nvfp4-fp8-mtp`](https://github.com/canada-quant/dsv4-pro-nvfp4-fp8-mtp) | |
| ### Upstream contributions filed during this work | |
| - vLLM: PRs [#42209](https://github.com/vllm-project/vllm/pull/42209) (merged β NVFP4 MoE for DSV4), [#43248](https://github.com/vllm-project/vllm/pull/43248), [#43288](https://github.com/vllm-project/vllm/pull/43288), [#43290](https://github.com/vllm-project/vllm/pull/43290), [#43319](https://github.com/vllm-project/vllm/pull/43319), [#43467](https://github.com/vllm-project/vllm/pull/43467), [#41511](https://github.com/vllm-project/vllm/issues/41511), [#41700](https://github.com/vllm-project/vllm/issues/41700) (landed via `jasl/vllm@1d6f5c4`) | |
| - llm-compressor: [#2745](https://github.com/vllm-project/llm-compressor/issues/2745) | |
| - compressed-tensors: [#711](https://github.com/vllm-project/compressed-tensors/issues/711) | |