|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- bitnet |
|
|
- moe |
|
|
- mixture-of-experts |
|
|
- 1-bit |
|
|
- quantized |
|
|
- compression |
|
|
- security |
|
|
- m2m-protocol |
|
|
pipeline_tag: text-classification |
|
|
datasets: |
|
|
- custom |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Hydra BitNet - M2M Protocol SLM |
|
|
|
|
|
A 1.58-bit quantized Mixture-of-Experts model for LLM API optimization. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Hydra is an ultra-compact neural network designed for the M2M Protocol. It uses: |
|
|
- **BitNet 1.58-bit quantization**: Weights are ternary {-1, 0, +1} |
|
|
- **Mixture-of-Experts**: 4 specialized experts with top-2 routing |
|
|
- **Task-specific heads**: Compression routing and security detection |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Parameters | ~9.7M | |
|
|
| Model Size | ~3.7 MB (1.58-bit) | |
|
|
| Hidden Size | 192 | |
|
|
| Layers | 4 | |
|
|
| Experts | 4 | |
|
|
| Vocab Size | 32000 | |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Compression Routing |
|
|
- **Task**: Predict optimal compression algorithm (NONE, BPE, BROTLI, ZLIB) |
|
|
- **Accuracy**: 99.4% |
|
|
- **Latency**: <5ms on GPU |
|
|
|
|
|
### Security Detection |
|
|
- **Task**: Detect prompt injection and jailbreak attempts |
|
|
- **Accuracy**: 96.2% |
|
|
- **Latency**: <5ms on GPU |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from safetensors.torch import load_file |
|
|
|
|
|
# Load model |
|
|
weights = load_file("model.safetensors") |
|
|
|
|
|
# Or use with the m2m-protocol package |
|
|
from m2m_protocol import M2MClient |
|
|
|
|
|
client = M2MClient(target_model="gpt-4") |
|
|
result = client.process(your_message) |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Compression Expert**: Trained with DPO on 100K message pairs |
|
|
- **Security Expert**: Fine-tuned on 60K security samples (prompt injection, jailbreak, safe) |
|
|
|
|
|
## Architecture |
|
|
|
|
|
``` |
|
|
HydraBitNet( |
|
|
(embeddings): Embedding(256, 256) |
|
|
(encoder): ModuleList( |
|
|
(0-5): 6 x TaskSpecializedMoELayer( |
|
|
(gate): Linear(256, 4) |
|
|
(experts): ModuleList( |
|
|
(0): CompressionExpert |
|
|
(1): SecurityExpert |
|
|
(2): SemanticExpert |
|
|
(3): GeneralExpert |
|
|
) |
|
|
) |
|
|
) |
|
|
(classifier): ModuleDict( |
|
|
(compression): BitLinear(256, 4) |
|
|
(security): BitLinear(256, 2) |
|
|
) |
|
|
) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{hydra_bitnet, |
|
|
title = {Hydra BitNet: Ultra-Compact MoE for M2M Protocol}, |
|
|
author = {M2M Protocol Team}, |
|
|
year = {2026}, |
|
|
url = {https://github.com/OpenACI-AI/m2m-protocol} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|