Hydra BitNet - M2M Protocol SLM
A 1.58-bit quantized Mixture-of-Experts model for LLM API optimization.
Model Description
Hydra is an ultra-compact neural network designed for the M2M Protocol. It uses:
- BitNet 1.58-bit quantization: Weights are ternary {-1, 0, +1}
- Mixture-of-Experts: 4 specialized experts with top-2 routing
- Task-specific heads: Compression routing and security detection
Model Details
| Property | Value |
|---|---|
| Parameters | ~9.7M |
| Model Size | ~3.7 MB (1.58-bit) |
| Hidden Size | 192 |
| Layers | 4 |
| Experts | 4 |
| Vocab Size | 32000 |
Performance
Compression Routing
- Task: Predict optimal compression algorithm (NONE, BPE, BROTLI, ZLIB)
- Accuracy: 99.4%
- Latency: <5ms on GPU
Security Detection
- Task: Detect prompt injection and jailbreak attempts
- Accuracy: 96.2%
- Latency: <5ms on GPU
Usage
import torch
from safetensors.torch import load_file
# Load model
weights = load_file("model.safetensors")
# Or use with the m2m-protocol package
from m2m_protocol import M2MClient
client = M2MClient(target_model="gpt-4")
result = client.process(your_message)
Training
- Compression Expert: Trained with DPO on 100K message pairs
- Security Expert: Fine-tuned on 60K security samples (prompt injection, jailbreak, safe)
Architecture
HydraBitNet(
(embeddings): Embedding(256, 256)
(encoder): ModuleList(
(0-5): 6 x TaskSpecializedMoELayer(
(gate): Linear(256, 4)
(experts): ModuleList(
(0): CompressionExpert
(1): SecurityExpert
(2): SemanticExpert
(3): GeneralExpert
)
)
)
(classifier): ModuleDict(
(compression): BitLinear(256, 4)
(security): BitLinear(256, 2)
)
)
Citation
@software{hydra_bitnet,
title = {Hydra BitNet: Ultra-Compact MoE for M2M Protocol},
author = {M2M Protocol Team},
year = {2026},
url = {https://github.com/infernet-org/m2m-protocol}
}
License
Apache 2.0
- Downloads last month
- 56