Spaces:

canada-quant
/

README

Running

App Files Files Community

README / README.md

pastapaul

Initial org card: lab intro + DeepSeek-V4 family release table

3cf8366 verified 6 days ago

preview code

raw

history blame contribute delete

4.09 kB

	---
	title: Canada Quant Labs
	emoji: 🍁
	colorFrom: red
	colorTo: gray
	sdk: static
	pinned: false
	---

	# Canada Quant Labs

	Canada's open-weight model lab.

	We train, quantize, and deploy sovereign AI models on Canadian Blackwell silicon — for the regulated industries that can't run on someone else's API.

	What we do
	- Post-training on open base models (SFT, DPO, GRPO, RLAIF)
	- Production quantization recipes (W4A16, NVFP4, MXFP4)
	- Audited, air-gapped deployment with eval evidence and MRM docs

	Where we work
	- Legal · Medical · Defence · Finance
	- Headquarters: Victoria, BC
	- Compute: NVIDIA DGX B300 at Equinix Vancouver

	Upstream
	- Contributors to vLLM, llm-compressor, compressed-tensors

	Partnerships · partnerships@cql.ca
	Press · press@cql.ca
	Web · [cql.ca](https://cql.ca)

	---

	## Open releases — DeepSeek-V4 quantization family

	Four artifacts in the same lineage. One base model in two sizes (V4-Flash, V4-Pro); two routed-expert formats (W4A16, NVFP4); Multi-Token Prediction (MTP) draft head retained on three of four. Attention is FP8 block 128×128 across all four.

	\| Model \| Base \| Routed experts \| MTP \| On-disk \| Min hardware (TP=2) \| When to pick \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| [DeepSeek-V4-Flash-W4A16-FP8](https://huggingface.co/canada-quant/DeepSeek-V4-Flash-W4A16-FP8) \| V4-Flash \| W4A16 INT4 g=128 \| no \| ~143 GB \| H200 / DGX Spark / RTX PRO 6000 \| maximum compatibility, no MTP needed \|
	\| [DeepSeek-V4-Flash-W4A16-FP8-MTP](https://huggingface.co/canada-quant/DeepSeek-V4-Flash-W4A16-FP8-MTP) \| V4-Flash \| W4A16 INT4 g=128 \| yes (BF16) \| 159 GB \| H200 / RTX PRO 6000 \| best $/token interactive on V4-Flash \|
	\| [DeepSeek-V4-Flash-NVFP4-FP8-MTP](https://huggingface.co/canada-quant/DeepSeek-V4-Flash-NVFP4-FP8-MTP) \| V4-Flash \| NVFP4 g=16 \| yes (BF16) \| 172 GB \| RTX PRO 6000 / B300 \| best Blackwell-native interactive on V4-Flash \|
	\| [DeepSeek-V4-Pro-NVFP4-FP8-MTP](https://huggingface.co/canada-quant/DeepSeek-V4-Pro-NVFP4-FP8-MTP) \| V4-Pro \| NVFP4 g=16 \| yes (byte-identical) \| 913 GiB \| 8× B300 (TP=8 + EP) \| only choice for V4-Pro deployment; +25–37% throughput vs upstream MXFP4 \|

	Upstream reference recipes: [`RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8`](https://huggingface.co/RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8) (Flash NVFP4 topology) and [`nvidia/DeepSeek-V3.2-NVFP4`](https://huggingface.co/nvidia/DeepSeek-V3.2-NVFP4) (Pro NVFP4, MTP-exclusion topology).

	### Hardware shorthand

	- H200 — 8× NVIDIA H200 SXM5 (Hopper SM 9.0a, 141 GB HBM3e/GPU)
	- DGX Spark — 2× NVIDIA DGX Spark (GB10, Blackwell SM 12.1a)
	- RTX PRO 6000 — NVIDIA RTX PRO 6000 Blackwell Server Edition (SM 12.0, sm_120, 96 GB HBM)
	- B300 — NVIDIA B300 SXM6 AC (Blackwell SM 10.3, sm_103a, 288 GB HBM3e/GPU)

	### Reproduction repos

	Every artifact has a public reproduction repo with calibration scripts, vLLM patches, bench harnesses, and findings docs:

	- [`canada-quant/dsv4-flash-w4a16-fp8`](https://github.com/canada-quant/dsv4-flash-w4a16-fp8)
	- [`canada-quant/dsv4-flash-w4a16-fp8-mtp`](https://github.com/canada-quant/dsv4-flash-w4a16-fp8-mtp)
	- [`canada-quant/dsv4-flash-nvfp4-fp8-mtp`](https://github.com/canada-quant/dsv4-flash-nvfp4-fp8-mtp)
	- [`canada-quant/dsv4-pro-nvfp4-fp8-mtp`](https://github.com/canada-quant/dsv4-pro-nvfp4-fp8-mtp)

	### Upstream contributions filed during this work

	- vLLM: PRs [#42209](https://github.com/vllm-project/vllm/pull/42209) (merged — NVFP4 MoE for DSV4), [#43248](https://github.com/vllm-project/vllm/pull/43248), [#43288](https://github.com/vllm-project/vllm/pull/43288), [#43290](https://github.com/vllm-project/vllm/pull/43290), [#43319](https://github.com/vllm-project/vllm/pull/43319), [#43467](https://github.com/vllm-project/vllm/pull/43467), [#41511](https://github.com/vllm-project/vllm/issues/41511), [#41700](https://github.com/vllm-project/vllm/issues/41700) (landed via `jasl/vllm@1d6f5c4`)
	- llm-compressor: [#2745](https://github.com/vllm-project/llm-compressor/issues/2745)
	- compressed-tensors: [#711](https://github.com/vllm-project/compressed-tensors/issues/711)