|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- bn |
|
|
- en |
|
|
tags: |
|
|
- transformers |
|
|
- safetensors |
|
|
- stable-diffusion |
|
|
- bangla |
|
|
- text-to-video |
|
|
- lora |
|
|
- scene-planning |
|
|
- computer-vision |
|
|
- natural-language-processing |
|
|
- mlops |
|
|
- production-grade |
|
|
pipeline_tag: text-to-video |
|
|
model-index: |
|
|
- name: memo |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Memo: Production-Grade Transformers + Safetensors Implementation |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
## Overview |
|
|
|
|
|
This is the complete transformation of Memo to use **Transformers + Safetensors** properly, replacing unsafe pickle files and toy logic with enterprise-grade machine learning infrastructure. |
|
|
|
|
|
## What We've Built |
|
|
|
|
|
### ✅ Core Requirements Met |
|
|
|
|
|
1. **Transformers Integration** |
|
|
- Bangla text parsing using `google/mt5-small` |
|
|
- Proper tokenization and model loading |
|
|
- Deterministic scene extraction with controlled parameters |
|
|
- Memory optimization with device mapping |
|
|
|
|
|
2. **Safetensors Security** |
|
|
- **MANDATORY** `use_safetensors=True` for all model loading |
|
|
- No .bin, .ckpt, or pickle files anywhere |
|
|
- Model weight validation and security checks |
|
|
- Signature verification for LoRA files |
|
|
|
|
|
3. **Production Architecture** |
|
|
- Tier-based model management (Free/Pro/Enterprise) |
|
|
- Memory optimization and performance tuning |
|
|
- Background processing for long-running tasks |
|
|
- Proper error handling and logging |
|
|
|
|
|
## File Structure |
|
|
|
|
|
``` |
|
|
📁 Memo/ |
|
|
├── 📄 requirements.txt # Production dependencies |
|
|
├── 📁 models/ |
|
|
│ └── 📁 text/ |
|
|
│ └── 📄 bangla_parser.py # Transformer-based Bangla parser |
|
|
├── 📁 core/ |
|
|
│ └── 📄 scene_planner.py # ML-based scene planning |
|
|
├── 📁 models/ |
|
|
│ └── 📁 image/ |
|
|
│ └── 📄 sd_generator.py # Stable Diffusion + Safetensors |
|
|
├── 📁 data/ |
|
|
│ └── 📁 lora/ |
|
|
│ └── 📄 README.md # LoRA configuration (safetensors only) |
|
|
├── 📁 scripts/ |
|
|
│ └── 📄 train_scene_lora.py # Training with safetensors output |
|
|
├── 📁 config/ |
|
|
│ └── 📄 model_tiers.py # Tier management system |
|
|
└── 📁 api/ |
|
|
└── 📄 main.py # Production API endpoint |
|
|
``` |
|
|
|
|
|
## Key Features |
|
|
|
|
|
### 🔒 Security (Non-Negotiable) |
|
|
- **Safetensors-only model loading** - No unsafe formats |
|
|
- **Model signature validation** - Verify weight integrity |
|
|
- **LoRA security checks** - Ensure only .safetensors files |
|
|
- **Memory-safe loading** - Prevent buffer overflows |
|
|
|
|
|
### 🚀 Performance |
|
|
- **Memory optimization** - xFormers, attention slicing, CPU offload |
|
|
- **FP16 precision** - 50% memory reduction with maintained quality |
|
|
- **LCM acceleration** - Faster inference when available |
|
|
- **Device mapping** - Optimal GPU/CPU utilization |
|
|
|
|
|
### 🏢 Enterprise Features |
|
|
- **Tier-based pricing** - Free/Pro/Enterprise configurations |
|
|
- **Resource management** - Memory limits and concurrent request handling |
|
|
- **Security compliance** - Audit trails and validation |
|
|
- **Scalability** - Background processing and proper async handling |
|
|
|
|
|
## Model Tiers |
|
|
|
|
|
### Free Tier |
|
|
- Base SDXL model (512x512) |
|
|
- 15 inference steps |
|
|
- No LoRA |
|
|
- 1 concurrent request |
|
|
|
|
|
### Pro Tier |
|
|
- Base SDXL model (768x768) |
|
|
- 25 inference steps |
|
|
- Scene LoRA enabled |
|
|
- LCM acceleration |
|
|
- 3 concurrent requests |
|
|
|
|
|
### Enterprise Tier |
|
|
- Base SDXL model (1024x1024) |
|
|
- 30 inference steps |
|
|
- Custom LoRA support |
|
|
- LCM acceleration |
|
|
- 10 concurrent requests |
|
|
|
|
|
## Usage Examples |
|
|
|
|
|
### Basic Scene Planning |
|
|
```python |
|
|
from core.scene_planner import plan_scenes |
|
|
|
|
|
scenes = plan_scenes( |
|
|
text_bn="আজকের দিনটি খুব সুন্দর ছিল।", |
|
|
duration=15 |
|
|
) |
|
|
``` |
|
|
|
|
|
### Tier-Based Generation |
|
|
```python |
|
|
from config.model_tiers import get_tier_config |
|
|
from models.image.sd_generator import get_generator |
|
|
|
|
|
config = get_tier_config("pro") |
|
|
generator = get_generator( |
|
|
model_id=config.image_model_id, |
|
|
lora_path=config.lora_path, |
|
|
use_lcm=config.lcm_enabled |
|
|
) |
|
|
|
|
|
frames = generator.generate_frames( |
|
|
prompt="Beautiful landscape scene", |
|
|
frames=5 |
|
|
) |
|
|
``` |
|
|
|
|
|
### API Usage |
|
|
```bash |
|
|
curl -X POST "http://localhost:8000/generate" \\ |
|
|
-H "Content-Type: application/json" \\ |
|
|
-d '{ |
|
|
"text": "আজকের দিনটি খুব সুন্দর ছিল।", |
|
|
"duration": 15, |
|
|
"tier": "pro" |
|
|
}' |
|
|
``` |
|
|
|
|
|
## Training Custom LoRA |
|
|
|
|
|
```python |
|
|
from scripts.train_scene_lora import SceneLoRATrainer, TrainingConfig |
|
|
|
|
|
config = TrainingConfig( |
|
|
base_model="google/mt5-small", |
|
|
rank=32, |
|
|
alpha=64, |
|
|
save_safetensors=True # MANDATORY |
|
|
) |
|
|
|
|
|
trainer = SceneLoRATrainer(config) |
|
|
trainer.load_model() |
|
|
trainer.setup_lora() |
|
|
trainer.train(training_data) |
|
|
``` |
|
|
|
|
|
## Security Validation |
|
|
|
|
|
```python |
|
|
from config.model_tiers import validate_model_weights_security |
|
|
|
|
|
result = validate_model_weights_security("data/lora/memo-scene-lora.safetensors") |
|
|
print(f"Secure: {result['is_secure']}") |
|
|
print(f"Issues: {result['issues']}") |
|
|
``` |
|
|
|
|
|
## What This Guarantees |
|
|
|
|
|
✅ **Transformers-based** - Real ML, not toy logic |
|
|
✅ **Safetensors-only** - No security vulnerabilities |
|
|
✅ **Production-ready** - Enterprise architecture |
|
|
✅ **Memory optimized** - Proper resource management |
|
|
✅ **Tier-based** - Scalable pricing model |
|
|
✅ **Audit compliant** - Security validation built-in |
|
|
|
|
|
## What This Doesn't Do |
|
|
|
|
|
❌ Make GPUs cheap |
|
|
❌ Fix bad prompts |
|
|
❌ Read your mind |
|
|
❌ Guarantee perfect results |
|
|
|
|
|
## Next Steps |
|
|
|
|
|
If you're serious about production deployment: |
|
|
|
|
|
1. **Cold-start optimization** - Preload frequently used models |
|
|
2. **Model versioning** - Track changes per tier |
|
|
3. **A/B testing** - Compare model performance |
|
|
4. **Monitoring** - Track usage and performance metrics |
|
|
5. **Load balancing** - Distribute across multiple GPUs |
|
|
|
|
|
## Running the System |
|
|
|
|
|
```bash |
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Train custom LoRA |
|
|
python scripts/train_scene_lora.py |
|
|
|
|
|
# Start API server |
|
|
python api/main.py |
|
|
|
|
|
# Check health |
|
|
curl http://localhost:8000/health |
|
|
``` |
|
|
|
|
|
## Reality Check |
|
|
|
|
|
This implementation is now: |
|
|
- ✅ **Correct** - Uses proper ML frameworks |
|
|
- ✅ **Modern** - Transformers + Safetensors |
|
|
- ✅ **Secure** - No unsafe model formats |
|
|
- ✅ **Scalable** - Tier-based architecture |
|
|
- ✅ **Defensible** - Production-grade security |
|
|
|
|
|
If your API claims "state-of-the-art" without these features, you're lying. Memo now actually delivers on that promise. |