| | --- |
| | language: |
| | - en |
| | tags: |
| | - ocr |
| | - vision |
| | - image-to-text |
| | - metanthropic |
| | - bulbul |
| | - sovereign |
| | license: other |
| | base_model: metanthropic/BulBul-OCR |
| | pipeline_tag: image-text-to-text |
| | --- |
| | |
| | # π¦
Metanthropic BulBul-OCR |
| |
|
| | **BulBul-OCR** is a sovereign, high-efficiency Optical Character Recognition model engineered by **Metanthropic**. It is a 0.9B parameter vision-language model optimized for speed, accuracy, and secure deployment. |
| |
|
| | --- |
| |
|
| | ## π Sovereign Encryption |
| |
|
| | This model is distributed in the **.mguf (Metanthropic Unified Format)**. The weights are encrypted using AES-GCM 256-bit encryption to ensure intellectual property protection and authorized usage only. |
| |
|
| | - **Status:** Encrypted |
| | - **Format:** Binary MGUF |
| | - **Key Requirement:** Yes (Proprietary Access Key) |
| |
|
| | --- |
| |
|
| | ## π§ Model Details |
| |
|
| | - **Developer:** Metanthropic Research Labs |
| | - **Model Type:** Sovereign Vision-Language Model (VLM) |
| | - **Architecture:** 0.9B Parameter Vision Transformer (ViT) + Language Decoder |
| | - **Capabilities:** High-density text extraction, document understanding, and visual question answering |
| | - **Identity:** Fine-tuned to operate as a distinct entity ("BulBul-OCR") separate from its base architecture |
| |
|
| | --- |
| |
|
| | ## π» Usage |
| |
|
| | This model cannot be loaded with standard Hugging Face libraries (`transformers`). It requires the proprietary **Metanthropic Loader** to decrypt the weights in memory. |
| |
|
| | ### Python Implementation |
| |
|
| | ```python |
| | import os |
| | from huggingface_hub import hf_hub_download |
| | from cryptography.hazmat.primitives.ciphers.aead import AESGCM |
| | from transformers import AutoModelForImageTextToText, AutoProcessor |
| | |
| | # 1. Configuration |
| | REPO_ID = "metanthropic/BulBul-OCR" |
| | FILENAME = "bulbul-ocr-v1.mguf" |
| | SECRET_KEY = "YOUR_ACCESS_KEY_HERE" # Provided by Metanthropic Admin |
| | |
| | # 2. Download Encrypted Asset |
| | file_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME) |
| | |
| | # 3. Secure Decryption (In-Memory) |
| | key_bytes = bytes.fromhex(SECRET_KEY) |
| | aesgcm = AESGCM(key_bytes) |
| | |
| | with open(file_path, "rb") as f: |
| | nonce = f.read(12) |
| | header_len = int.from_bytes(f.read(4), 'little') |
| | encrypted_header = f.read(header_len) |
| | rest_of_body = f.read() |
| | |
| | # Decrypt Header |
| | decrypted_header = aesgcm.decrypt(nonce, encrypted_header, None) |
| | |
| | # 4. Load Model |
| | # (Note: In production, use a temp file or stream directly to avoid disk writes) |
| | os.makedirs("temp_load", exist_ok=True) |
| | with open("temp_load/model.safetensors", "wb") as f: |
| | f.write(decrypted_header) |
| | f.write(rest_of_body) |
| | |
| | print("β
Model Decrypted. Loading into VRAM...") |
| | model = AutoModelForImageTextToText.from_pretrained( |
| | "temp_load", |
| | trust_remote_code=True, |
| | device_map="auto" |
| | ) |
| | processor = AutoProcessor.from_pretrained(REPO_ID, trust_remote_code=True) |
| | |
| | # 5. Run Inference |
| | from PIL import Image |
| | |
| | # Load your image |
| | image = Image.open("document.png") |
| | |
| | # Process and generate |
| | inputs = processor(images=image, return_tensors="pt").to(model.device) |
| | generated_ids = model.generate(**inputs, max_new_tokens=512) |
| | result = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| | |
| | print(result) |
| | ``` |
| |
|
| | ### Installation Requirements |
| |
|
| | ```bash |
| | pip install transformers huggingface_hub cryptography pillow torch |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π Performance Benchmarks |
| |
|
| | | Dataset | Accuracy | Speed (imgs/sec) | |
| | |---------|----------|------------------| |
| | | SROIE | 94.2% | 12.5 | |
| | | FUNSD | 91.8% | 10.3 | |
| | | RVL-CDIP| 89.7% | 15.2 | |
| |
|
| | --- |
| |
|
| | ## π Key Features |
| |
|
| | - **High-Speed Inference:** Optimized for real-time OCR applications |
| | - **Multi-Language Support:** Primary focus on English with expandable architecture |
| | - **Document Understanding:** Beyond OCR - understands layout and structure |
| | - **Sovereign Architecture:** Encrypted weights ensure IP protection |
| | - **Low Resource Requirements:** Runs efficiently on consumer-grade GPUs |
| |
|
| | --- |
| |
|
| | ## π§ System Requirements |
| |
|
| | - **Minimum:** |
| | - GPU: 4GB VRAM (NVIDIA GTX 1650 or equivalent) |
| | - RAM: 8GB |
| | - Storage: 2GB |
| |
|
| | - **Recommended:** |
| | - GPU: 8GB VRAM (NVIDIA RTX 3060 or equivalent) |
| | - RAM: 16GB |
| | - Storage: 5GB |
| |
|
| | --- |
| |
|
| | ## β οΈ License & Restrictions |
| |
|
| | This is a proprietary model released by Metanthropic. |
| |
|
| | - **Commercial Use:** Restricted to authorized partners only |
| | - **Modification:** Prohibited without express written consent from Metanthropic |
| | - **Redistribution:** The .mguf file may be mirrored, but decryption keys must not be shared publicly |
| | - **Access:** Contact Metanthropic Research Labs for licensing and access key provisioning |
| |
|
| | --- |
| |
|
| | ## π Contact & Support |
| |
|
| | - **Email:** support@metanthropic.ai |
| | - **Documentation:** https://docs.metanthropic.ai/bulbul-ocr |
| | - **License Inquiries:** licensing@metanthropic.ai |
| |
|
| | --- |
| |
|
| | ## π Citation |
| |
|
| | If you use BulBul-OCR in your research, please cite: |
| |
|
| | ```bibtex |
| | @misc{bulbul-ocr-2024, |
| | title={BulBul-OCR: A Sovereign Vision-Language Model for Optical Character Recognition}, |
| | author={Metanthropic Research Labs}, |
| | year={2024}, |
| | publisher={Metanthropic}, |
| | howpublished={\url{https://huggingface.co/metanthropic/BulBul-OCR}} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | **Engineered by Metanthropic. Powered by Sovereign Intelligence.** |
| |
|