Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +62 -86

README.md CHANGED Viewed

@@ -1,110 +1,86 @@
----
-license: apache-2.0
-library_name: transformers
-tags:
-  - bitnet
-  - moe
-  - mixture-of-experts
-  - 1-bit
-  - quantized
-  - compression
-  - security
-  - m2m-protocol
-pipeline_tag: text-classification
-datasets:
-  - custom
-language:
-  - en
----
-# Hydra BitNet - M2M Protocol SLM
-A 1.58-bit quantized Mixture-of-Experts model for LLM API optimization.
-## Model Description
-Hydra is an ultra-compact neural network designed for the M2M Protocol. It uses:
-- **BitNet 1.58-bit quantization**: Weights are ternary {-1, 0, +1}
-- **Mixture-of-Experts**: 4 specialized experts with top-2 routing
-- **Task-specific heads**: Compression routing and security detection
 ## Model Details
 | Property | Value |
 |----------|-------|
 | Parameters | ~9.7M |
-| Model Size | ~3.7 MB (1.58-bit) |
-| Hidden Size | 192 |
-| Layers | 4 |
 | Experts | 4 |
-| Vocab Size | 32000 |
-## Performance
-### Compression Routing
-- **Task**: Predict optimal compression algorithm (NONE, BPE, BROTLI, ZLIB)
-- **Accuracy**: 99.4%
-- **Latency**: <5ms on GPU
-### Security Detection
-- **Task**: Detect prompt injection and jailbreak attempts
-- **Accuracy**: 96.2%
-- **Latency**: <5ms on GPU
 ## Usage
 ```python
 import torch
 from safetensors.torch import load_file
-# Load model
-weights = load_file("model.safetensors")
-# Or use with the m2m-protocol package
-from m2m_protocol import M2MClient
-client = M2MClient(target_model="gpt-4")
-result = client.process(your_message)
 ```
 ## Training
-- **Compression Expert**: Trained with DPO on 100K message pairs
-- **Security Expert**: Fine-tuned on 60K security samples (prompt injection, jailbreak, safe)
-## Architecture
-```
-HydraBitNet(
-  (embeddings): Embedding(256, 256)
-  (encoder): ModuleList(
-    (0-5): 6 x TaskSpecializedMoELayer(
-      (gate): Linear(256, 4)
-      (experts): ModuleList(
-        (0): CompressionExpert
-        (1): SecurityExpert
-        (2): SemanticExpert
-        (3): GeneralExpert
-      )
-    )
-  )
-  (classifier): ModuleDict(
-    (compression): BitLinear(256, 4)
-    (security): BitLinear(256, 2)
-  )
-)
-```
-## Citation
-```bibtex
-@software{hydra_bitnet,
-  title = {Hydra BitNet: Ultra-Compact MoE for M2M Protocol},
-  author = {M2M Protocol Team},
-  year = {2026},
-  url = {https://github.com/OpenACI-AI/m2m-protocol}
-}
-```
 ## License

+# Hydra - M2M Protocol Classifier
+A 1.58-bit quantized BitNet model for LLM API optimization.
+## What This Model Does
+Hydra is a **fast classifier** (not a chatbot) that makes two decisions:
+### 1. Compression Routing (99.4% accuracy)
+Predicts the optimal compression algorithm for LLM API requests:
+- `NONE` - Don't compress (short messages)
+- `BPE` - Token compression (structured JSON)
+- `BROTLI` - Byte compression (long prose)
+- `ZLIB` - Fallback compression
+### 2. Security Screening (96.2% accuracy)
+Detects malicious inputs:
+- `SAFE` - Normal request, allow
+- `UNSAFE` - Prompt injection/jailbreak, block
 ## Model Details
 | Property | Value |
 |----------|-------|
+| Architecture | BitNet MoE (Mixture-of-Experts) |
 | Parameters | ~9.7M |
+| Quantization | 1.58-bit (ternary weights) |
+| Model Size | ~37 MB (safetensors) |
+| Inference | <5ms on GPU, <10ms on CPU |
+| Hidden Size | 256 |
+| Layers | 6 |
 | Experts | 4 |
 ## Usage
 ```python
 import torch
 from safetensors.torch import load_file
+from huggingface_hub import hf_hub_download
+# Download model
+model_path = hf_hub_download("infernet/hydra", "model.safetensors")
+weights = load_file(model_path)
+# Load into architecture (requires m2m-protocol package)
+# pip install m2m-protocol
+from aisim.bitnet_moe import M2MSentinel
+model = M2MSentinel(vocab_size=256, dim=256, depth=6, experts=4)
+model.load_state_dict(weights)
+model.eval()
+# Inference
+text = "Hello, how are you?"
+tokens = torch.tensor([[ord(c) % 256 for c in text[:128]]])
+# Compression routing
+logits = model(tokens, task='compression')
+pred = logits.argmax(-1).item()
+labels = ['NONE', 'BPE', 'BROTLI', 'ZLIB']
+print(f"Compression: {labels[pred]}")
+# Security check
+logits = model(tokens, task='security')
+is_safe = logits.argmax(-1).item() == 0
+print(f"Safe: {is_safe}")
 ```
 ## Training
+- **Compression Expert**: DPO training on 100K message pairs
+- **Security Expert**: Fine-tuned on 60K samples (prompt injection, jailbreak, safe)
+## Limitations
+- **Not a chatbot** - Cannot generate text or have conversations
+- **Classifier only** - Outputs class labels, not language
+- **ASCII tokenization** - Uses simple byte-level tokenization
+## Links
+- [M2M Protocol GitHub](https://github.com/OpenACI-AI/m2m-protocol)
+- [Paper](https://github.com/OpenACI-AI/m2m-protocol/blob/main/paper/infernet_m2m_protocol.pdf)
 ## License