Hydra BitNet - M2M Protocol SLM

A 1.58-bit quantized Mixture-of-Experts model for LLM API optimization.

Model Description

Hydra is an ultra-compact neural network designed for the M2M Protocol. It uses:

  • BitNet 1.58-bit quantization: Weights are ternary {-1, 0, +1}
  • Mixture-of-Experts: 4 specialized experts with top-2 routing
  • Task-specific heads: Compression routing and security detection

Model Details

Property Value
Parameters ~9.7M
Model Size ~3.7 MB (1.58-bit)
Hidden Size 192
Layers 4
Experts 4
Vocab Size 32000

Performance

Compression Routing

  • Task: Predict optimal compression algorithm (NONE, BPE, BROTLI, ZLIB)
  • Accuracy: 99.4%
  • Latency: <5ms on GPU

Security Detection

  • Task: Detect prompt injection and jailbreak attempts
  • Accuracy: 96.2%
  • Latency: <5ms on GPU

Usage

import torch
from safetensors.torch import load_file

# Load model
weights = load_file("model.safetensors")

# Or use with the m2m-protocol package
from m2m_protocol import M2MClient

client = M2MClient(target_model="gpt-4")
result = client.process(your_message)

Training

  • Compression Expert: Trained with DPO on 100K message pairs
  • Security Expert: Fine-tuned on 60K security samples (prompt injection, jailbreak, safe)

Architecture

HydraBitNet(
  (embeddings): Embedding(256, 256)
  (encoder): ModuleList(
    (0-5): 6 x TaskSpecializedMoELayer(
      (gate): Linear(256, 4)
      (experts): ModuleList(
        (0): CompressionExpert
        (1): SecurityExpert  
        (2): SemanticExpert
        (3): GeneralExpert
      )
    )
  )
  (classifier): ModuleDict(
    (compression): BitLinear(256, 4)
    (security): BitLinear(256, 2)
  )
)

Citation

@software{hydra_bitnet,
  title = {Hydra BitNet: Ultra-Compact MoE for M2M Protocol},
  author = {M2M Protocol Team},
  year = {2026},
  url = {https://github.com/OpenACI-AI/m2m-protocol}
}

License

Apache 2.0

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support