Qwen3.5-0.8B — De-API Model Shards (.dms)

Pipeline-parallel shards of Qwen3.5-0.8B in .dms (De-API Model Shard) format.

What is .dms?

A self-contained binary format where one file = one node. Each .dms file bundles:

Model architecture config (inference parameters only)
Weight tensors (float16, 64-byte aligned)
Tokenizer (included in node 0 shard only)
Shard metadata (node ID, layer range, role)

No separate config.json, tokenizer.json, or index.json needed.

Shards

File	Size	Layers	Role
`qwen3.5-0.8b-node0.dms`	976 MB	0-11	embed + 9 linear attn + 3 full attn + tokenizer
`qwen3.5-0.8b-node1.dms`	960 MB	12-23	9 linear attn + 3 full attn + norm + lm_head

Architecture

Qwen3.5 hybrid: GatedDeltaNet (linear attention / SSM) + GQA (full attention with RoPE).

24 layers, 1024 hidden, 8 attn heads, 2 KV heads, head_dim 256
Pattern: [linear, linear, linear, full] × 6

Pipeline Flow

User → Gateway → Node 0 (embed + layers 0-11) → Node 1 (layers 12-23 + head) → Gateway
                   ↑                                                              |
                   └──────────────── autoregressive loop ─────────────────────────┘

Each token flows through both nodes. Each node maintains its own KV/SSM cache.

Usage

# Download shards
huggingface-cli download ZYLIM/qwen3.5-0.8b-deapi-shards --local-dir shards/

# Start nodes (each machine downloads only its shard)
python node.py --shard shards/qwen3.5-0.8b-node0.dms --port 9001
python node.py --shard shards/qwen3.5-0.8b-node1.dms --port 9002

# Start gateway
python gateway.py --port 8000 --gateway-shard shards/qwen3.5-0.8b-node0.dms

# Open dashboard
open http://localhost:8000

Binary Format

┌─────────────────────────────────────┐
│ Magic: b"DEAPI\x01DM"   (8 bytes)  │
│ Version: uint32            (4 bytes) │
│ Header size: uint64        (8 bytes) │
├─────────────────────────────────────┤
│ Header (JSON, utf-8)                │
│   .model_config                     │
│   .shard_info                       │
│   .tokenizer        (node 0 only)  │
│   .tensors[]  (name, dtype, shape, │
│                offset, nbytes)      │
├─────────────────────────────────────┤
│ Tensor data (raw bytes, 64B align)  │
└─────────────────────────────────────┘

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZYLIM/qwen3.5-0.8b-deapi-shards

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Finetuned

(257)

this model