Qwen3.5-0.8B β€” De-API Model Shards (.dms)

Pipeline-parallel shards of Qwen3.5-0.8B in .dms (De-API Model Shard) format.

What is .dms?

A self-contained binary format where one file = one node. Each .dms file bundles:

  • Model architecture config (inference parameters only)
  • Weight tensors (float16, 64-byte aligned)
  • Tokenizer (included in node 0 shard only)
  • Shard metadata (node ID, layer range, role)

No separate config.json, tokenizer.json, or index.json needed.

Shards

File Size Layers Role
qwen3.5-0.8b-node0.dms 976 MB 0-11 embed + 9 linear attn + 3 full attn + tokenizer
qwen3.5-0.8b-node1.dms 960 MB 12-23 9 linear attn + 3 full attn + norm + lm_head

Architecture

Qwen3.5 hybrid: GatedDeltaNet (linear attention / SSM) + GQA (full attention with RoPE).

  • 24 layers, 1024 hidden, 8 attn heads, 2 KV heads, head_dim 256
  • Pattern: [linear, linear, linear, full] Γ— 6

Pipeline Flow

User β†’ Gateway β†’ Node 0 (embed + layers 0-11) β†’ Node 1 (layers 12-23 + head) β†’ Gateway
                   ↑                                                              |
                   └──────────────── autoregressive loop β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each token flows through both nodes. Each node maintains its own KV/SSM cache.

Usage

# Download shards
huggingface-cli download ZYLIM/qwen3.5-0.8b-deapi-shards --local-dir shards/

# Start nodes (each machine downloads only its shard)
python node.py --shard shards/qwen3.5-0.8b-node0.dms --port 9001
python node.py --shard shards/qwen3.5-0.8b-node1.dms --port 9002

# Start gateway
python gateway.py --port 8000 --gateway-shard shards/qwen3.5-0.8b-node0.dms

# Open dashboard
open http://localhost:8000

Binary Format

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Magic: b"DEAPI\x01DM"   (8 bytes)  β”‚
β”‚ Version: uint32            (4 bytes) β”‚
β”‚ Header size: uint64        (8 bytes) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Header (JSON, utf-8)                β”‚
β”‚   .model_config                     β”‚
β”‚   .shard_info                       β”‚
β”‚   .tokenizer        (node 0 only)  β”‚
β”‚   .tensors[]  (name, dtype, shape, β”‚
β”‚                offset, nbytes)      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Tensor data (raw bytes, 64B align)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ZYLIM/qwen3.5-0.8b-deapi-shards

Finetuned
(201)
this model