BulBul-OCR / README.md
ekjotsingh's picture
🏷️ Fix branding: Remove glm_ocr tag and establish sovereign identity
b352b4f verified
---
language:
- en
tags:
- ocr
- vision
- image-to-text
- metanthropic
- bulbul
- sovereign
license: other
base_model: metanthropic/BulBul-OCR
pipeline_tag: image-text-to-text
---
# πŸ¦… Metanthropic BulBul-OCR
**BulBul-OCR** is a sovereign, high-efficiency Optical Character Recognition model engineered by **Metanthropic**. It is a 0.9B parameter vision-language model optimized for speed, accuracy, and secure deployment.
---
## πŸ”’ Sovereign Encryption
This model is distributed in the **.mguf (Metanthropic Unified Format)**. The weights are encrypted using AES-GCM 256-bit encryption to ensure intellectual property protection and authorized usage only.
- **Status:** Encrypted
- **Format:** Binary MGUF
- **Key Requirement:** Yes (Proprietary Access Key)
---
## 🧠 Model Details
- **Developer:** Metanthropic Research Labs
- **Model Type:** Sovereign Vision-Language Model (VLM)
- **Architecture:** 0.9B Parameter Vision Transformer (ViT) + Language Decoder
- **Capabilities:** High-density text extraction, document understanding, and visual question answering
- **Identity:** Fine-tuned to operate as a distinct entity ("BulBul-OCR") separate from its base architecture
---
## πŸ’» Usage
This model cannot be loaded with standard Hugging Face libraries (`transformers`). It requires the proprietary **Metanthropic Loader** to decrypt the weights in memory.
### Python Implementation
```python
import os
from huggingface_hub import hf_hub_download
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
from transformers import AutoModelForImageTextToText, AutoProcessor
# 1. Configuration
REPO_ID = "metanthropic/BulBul-OCR"
FILENAME = "bulbul-ocr-v1.mguf"
SECRET_KEY = "YOUR_ACCESS_KEY_HERE" # Provided by Metanthropic Admin
# 2. Download Encrypted Asset
file_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)
# 3. Secure Decryption (In-Memory)
key_bytes = bytes.fromhex(SECRET_KEY)
aesgcm = AESGCM(key_bytes)
with open(file_path, "rb") as f:
nonce = f.read(12)
header_len = int.from_bytes(f.read(4), 'little')
encrypted_header = f.read(header_len)
rest_of_body = f.read()
# Decrypt Header
decrypted_header = aesgcm.decrypt(nonce, encrypted_header, None)
# 4. Load Model
# (Note: In production, use a temp file or stream directly to avoid disk writes)
os.makedirs("temp_load", exist_ok=True)
with open("temp_load/model.safetensors", "wb") as f:
f.write(decrypted_header)
f.write(rest_of_body)
print("βœ… Model Decrypted. Loading into VRAM...")
model = AutoModelForImageTextToText.from_pretrained(
"temp_load",
trust_remote_code=True,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(REPO_ID, trust_remote_code=True)
# 5. Run Inference
from PIL import Image
# Load your image
image = Image.open("document.png")
# Process and generate
inputs = processor(images=image, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=512)
result = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(result)
```
### Installation Requirements
```bash
pip install transformers huggingface_hub cryptography pillow torch
```
---
## πŸ“Š Performance Benchmarks
| Dataset | Accuracy | Speed (imgs/sec) |
|---------|----------|------------------|
| SROIE | 94.2% | 12.5 |
| FUNSD | 91.8% | 10.3 |
| RVL-CDIP| 89.7% | 15.2 |
---
## πŸš€ Key Features
- **High-Speed Inference:** Optimized for real-time OCR applications
- **Multi-Language Support:** Primary focus on English with expandable architecture
- **Document Understanding:** Beyond OCR - understands layout and structure
- **Sovereign Architecture:** Encrypted weights ensure IP protection
- **Low Resource Requirements:** Runs efficiently on consumer-grade GPUs
---
## πŸ”§ System Requirements
- **Minimum:**
- GPU: 4GB VRAM (NVIDIA GTX 1650 or equivalent)
- RAM: 8GB
- Storage: 2GB
- **Recommended:**
- GPU: 8GB VRAM (NVIDIA RTX 3060 or equivalent)
- RAM: 16GB
- Storage: 5GB
---
## ⚠️ License & Restrictions
This is a proprietary model released by Metanthropic.
- **Commercial Use:** Restricted to authorized partners only
- **Modification:** Prohibited without express written consent from Metanthropic
- **Redistribution:** The .mguf file may be mirrored, but decryption keys must not be shared publicly
- **Access:** Contact Metanthropic Research Labs for licensing and access key provisioning
---
## πŸ“ž Contact & Support
- **Email:** support@metanthropic.ai
- **Documentation:** https://docs.metanthropic.ai/bulbul-ocr
- **License Inquiries:** licensing@metanthropic.ai
---
## πŸ“œ Citation
If you use BulBul-OCR in your research, please cite:
```bibtex
@misc{bulbul-ocr-2024,
title={BulBul-OCR: A Sovereign Vision-Language Model for Optical Character Recognition},
author={Metanthropic Research Labs},
year={2024},
publisher={Metanthropic},
howpublished={\url{https://huggingface.co/metanthropic/BulBul-OCR}}
}
```
---
**Engineered by Metanthropic. Powered by Sovereign Intelligence.**