icxcn commited on
Commit
abf49e3
·
verified ·
1 Parent(s): a2ef792

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -86
README.md CHANGED
@@ -1,110 +1,86 @@
1
- ---
2
- license: apache-2.0
3
- library_name: transformers
4
- tags:
5
- - bitnet
6
- - moe
7
- - mixture-of-experts
8
- - 1-bit
9
- - quantized
10
- - compression
11
- - security
12
- - m2m-protocol
13
- pipeline_tag: text-classification
14
- datasets:
15
- - custom
16
- language:
17
- - en
18
- ---
19
-
20
- # Hydra BitNet - M2M Protocol SLM
21
-
22
- A 1.58-bit quantized Mixture-of-Experts model for LLM API optimization.
23
-
24
- ## Model Description
25
-
26
- Hydra is an ultra-compact neural network designed for the M2M Protocol. It uses:
27
- - **BitNet 1.58-bit quantization**: Weights are ternary {-1, 0, +1}
28
- - **Mixture-of-Experts**: 4 specialized experts with top-2 routing
29
- - **Task-specific heads**: Compression routing and security detection
30
 
31
  ## Model Details
32
 
33
  | Property | Value |
34
  |----------|-------|
 
35
  | Parameters | ~9.7M |
36
- | Model Size | ~3.7 MB (1.58-bit) |
37
- | Hidden Size | 192 |
38
- | Layers | 4 |
 
 
39
  | Experts | 4 |
40
- | Vocab Size | 32000 |
41
-
42
- ## Performance
43
-
44
- ### Compression Routing
45
- - **Task**: Predict optimal compression algorithm (NONE, BPE, BROTLI, ZLIB)
46
- - **Accuracy**: 99.4%
47
- - **Latency**: <5ms on GPU
48
-
49
- ### Security Detection
50
- - **Task**: Detect prompt injection and jailbreak attempts
51
- - **Accuracy**: 96.2%
52
- - **Latency**: <5ms on GPU
53
 
54
  ## Usage
55
 
56
  ```python
57
  import torch
58
  from safetensors.torch import load_file
59
-
60
- # Load model
61
- weights = load_file("model.safetensors")
62
-
63
- # Or use with the m2m-protocol package
64
- from m2m_protocol import M2MClient
65
-
66
- client = M2MClient(target_model="gpt-4")
67
- result = client.process(your_message)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ```
69
 
70
  ## Training
71
 
72
- - **Compression Expert**: Trained with DPO on 100K message pairs
73
- - **Security Expert**: Fine-tuned on 60K security samples (prompt injection, jailbreak, safe)
74
 
75
- ## Architecture
76
 
77
- ```
78
- HydraBitNet(
79
- (embeddings): Embedding(256, 256)
80
- (encoder): ModuleList(
81
- (0-5): 6 x TaskSpecializedMoELayer(
82
- (gate): Linear(256, 4)
83
- (experts): ModuleList(
84
- (0): CompressionExpert
85
- (1): SecurityExpert
86
- (2): SemanticExpert
87
- (3): GeneralExpert
88
- )
89
- )
90
- )
91
- (classifier): ModuleDict(
92
- (compression): BitLinear(256, 4)
93
- (security): BitLinear(256, 2)
94
- )
95
- )
96
- ```
97
 
98
- ## Citation
99
 
100
- ```bibtex
101
- @software{hydra_bitnet,
102
- title = {Hydra BitNet: Ultra-Compact MoE for M2M Protocol},
103
- author = {M2M Protocol Team},
104
- year = {2026},
105
- url = {https://github.com/OpenACI-AI/m2m-protocol}
106
- }
107
- ```
108
 
109
  ## License
110
 
 
1
+ # Hydra - M2M Protocol Classifier
2
+
3
+ A 1.58-bit quantized BitNet model for LLM API optimization.
4
+
5
+ ## What This Model Does
6
+
7
+ Hydra is a **fast classifier** (not a chatbot) that makes two decisions:
8
+
9
+ ### 1. Compression Routing (99.4% accuracy)
10
+ Predicts the optimal compression algorithm for LLM API requests:
11
+ - `NONE` - Don't compress (short messages)
12
+ - `BPE` - Token compression (structured JSON)
13
+ - `BROTLI` - Byte compression (long prose)
14
+ - `ZLIB` - Fallback compression
15
+
16
+ ### 2. Security Screening (96.2% accuracy)
17
+ Detects malicious inputs:
18
+ - `SAFE` - Normal request, allow
19
+ - `UNSAFE` - Prompt injection/jailbreak, block
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Model Details
22
 
23
  | Property | Value |
24
  |----------|-------|
25
+ | Architecture | BitNet MoE (Mixture-of-Experts) |
26
  | Parameters | ~9.7M |
27
+ | Quantization | 1.58-bit (ternary weights) |
28
+ | Model Size | ~37 MB (safetensors) |
29
+ | Inference | <5ms on GPU, <10ms on CPU |
30
+ | Hidden Size | 256 |
31
+ | Layers | 6 |
32
  | Experts | 4 |
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Usage
35
 
36
  ```python
37
  import torch
38
  from safetensors.torch import load_file
39
+ from huggingface_hub import hf_hub_download
40
+
41
+ # Download model
42
+ model_path = hf_hub_download("infernet/hydra", "model.safetensors")
43
+ weights = load_file(model_path)
44
+
45
+ # Load into architecture (requires m2m-protocol package)
46
+ # pip install m2m-protocol
47
+ from aisim.bitnet_moe import M2MSentinel
48
+
49
+ model = M2MSentinel(vocab_size=256, dim=256, depth=6, experts=4)
50
+ model.load_state_dict(weights)
51
+ model.eval()
52
+
53
+ # Inference
54
+ text = "Hello, how are you?"
55
+ tokens = torch.tensor([[ord(c) % 256 for c in text[:128]]])
56
+
57
+ # Compression routing
58
+ logits = model(tokens, task='compression')
59
+ pred = logits.argmax(-1).item()
60
+ labels = ['NONE', 'BPE', 'BROTLI', 'ZLIB']
61
+ print(f"Compression: {labels[pred]}")
62
+
63
+ # Security check
64
+ logits = model(tokens, task='security')
65
+ is_safe = logits.argmax(-1).item() == 0
66
+ print(f"Safe: {is_safe}")
67
  ```
68
 
69
  ## Training
70
 
71
+ - **Compression Expert**: DPO training on 100K message pairs
72
+ - **Security Expert**: Fine-tuned on 60K samples (prompt injection, jailbreak, safe)
73
 
74
+ ## Limitations
75
 
76
+ - **Not a chatbot** - Cannot generate text or have conversations
77
+ - **Classifier only** - Outputs class labels, not language
78
+ - **ASCII tokenization** - Uses simple byte-level tokenization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
+ ## Links
81
 
82
+ - [M2M Protocol GitHub](https://github.com/OpenACI-AI/m2m-protocol)
83
+ - [Paper](https://github.com/OpenACI-AI/m2m-protocol/blob/main/paper/infernet_m2m_protocol.pdf)
 
 
 
 
 
 
84
 
85
  ## License
86