Instructions to use ZYLIM/qwen3.5-0.8b-deapi-shards with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use ZYLIM/qwen3.5-0.8b-deapi-shards with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir qwen3.5-0.8b-deapi-shards ZYLIM/qwen3.5-0.8b-deapi-shards
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen3.5-0.8B β De-API Model Shards (.dms)
Pipeline-parallel shards of Qwen3.5-0.8B in .dms (De-API Model Shard) format.
What is .dms?
A self-contained binary format where one file = one node. Each .dms file bundles:
- Model architecture config (inference parameters only)
- Weight tensors (float16, 64-byte aligned)
- Tokenizer (included in node 0 shard only)
- Shard metadata (node ID, layer range, role)
No separate config.json, tokenizer.json, or index.json needed.
Shards
| File | Size | Layers | Role |
|---|---|---|---|
qwen3.5-0.8b-node0.dms |
976 MB | 0-11 | embed + 9 linear attn + 3 full attn + tokenizer |
qwen3.5-0.8b-node1.dms |
960 MB | 12-23 | 9 linear attn + 3 full attn + norm + lm_head |
Architecture
Qwen3.5 hybrid: GatedDeltaNet (linear attention / SSM) + GQA (full attention with RoPE).
- 24 layers, 1024 hidden, 8 attn heads, 2 KV heads, head_dim 256
- Pattern: [linear, linear, linear, full] Γ 6
Pipeline Flow
User β Gateway β Node 0 (embed + layers 0-11) β Node 1 (layers 12-23 + head) β Gateway
β |
βββββββββββββββββ autoregressive loop ββββββββββββββββββββββββββ
Each token flows through both nodes. Each node maintains its own KV/SSM cache.
Usage
# Download shards
huggingface-cli download ZYLIM/qwen3.5-0.8b-deapi-shards --local-dir shards/
# Start nodes (each machine downloads only its shard)
python node.py --shard shards/qwen3.5-0.8b-node0.dms --port 9001
python node.py --shard shards/qwen3.5-0.8b-node1.dms --port 9002
# Start gateway
python gateway.py --port 8000 --gateway-shard shards/qwen3.5-0.8b-node0.dms
# Open dashboard
open http://localhost:8000
Binary Format
βββββββββββββββββββββββββββββββββββββββ
β Magic: b"DEAPI\x01DM" (8 bytes) β
β Version: uint32 (4 bytes) β
β Header size: uint64 (8 bytes) β
βββββββββββββββββββββββββββββββββββββββ€
β Header (JSON, utf-8) β
β .model_config β
β .shard_info β
β .tokenizer (node 0 only) β
β .tensors[] (name, dtype, shape, β
β offset, nbytes) β
βββββββββββββββββββββββββββββββββββββββ€
β Tensor data (raw bytes, 64B align) β
βββββββββββββββββββββββββββββββββββββββ
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support