File size: 7,622 Bytes
a8fc815 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | # Memo: Production-Grade Transformers + Safetensors Implementation




## Overview
**Memo** is a complete transformation from toy logic to production-grade machine learning infrastructure. This implementation uses **Transformers + Safetensors** as the foundation for enterprise-level video generation with proper security, performance optimization, and scalability.
## π― What This Guarantees
β
**Transformers-based** - Real ML understanding, not toy logic
β
**Safetensors-only** - Zero security vulnerabilities
β
**Production-ready** - Enterprise architecture with proper error handling
β
**Memory optimized** - xFormers, attention slicing, CPU offload
β
**Tier-based scaling** - Free/Pro/Enterprise configurations
β
**Security compliant** - Audit trails and validation
## ποΈ Architecture
### Core Components
1. **Bangla Text Parser** (`models/text/bangla_parser.py`)
- Transformer-based scene extraction using `google/mt5-small`
- Proper tokenization with memory optimization
- Deterministic output with controlled parameters
2. **Scene Planner** (`core/scene_planner.py`)
- ML-based scene planning (no more toy logic)
- Intelligent timing and pacing calculations
- Visual style determination
3. **Stable Diffusion Generator** (`models/image/sd_generator.py`)
- **Safetensors-only model loading** (`use_safetensors=True`)
- Memory optimizations (xFormers, attention slicing, CPU offload)
- LoRA support with safetensors validation
- LCM acceleration for faster inference
4. **Model Tier System** (`config/model_tiers.py`)
- **Free Tier**: Basic 512x512, 15 steps, no LoRA
- **Pro Tier**: 768x768, 25 steps, scene LoRA, LCM
- **Enterprise Tier**: 1024x1024, 30 steps, custom LoRA
5. **Training Pipeline** (`scripts/train_scene_lora.py`)
- **MANDATORY** `save_safetensors=True`
- Transformers integration with PEFT
- Security-first training with proper validation
6. **Production API** (`api/main.py`)
- FastAPI endpoint with tier-based routing
- Background processing for long-running tasks
- Security validation endpoints
## π Security Implementation
### Model Weight Security
- **ONLY .safetensors files allowed** - No .bin, .ckpt, or pickle files
- Model signature verification
- File format enforcement
- Memory-safe loading practices
### LoRA Configuration (`data/lora/README.md`)
- **ONLY .safetensors files** - No .bin, .ckpt, or other formats allowed
- Model signatures required
- Version tracking and audit trails
## π Usage Examples
### Basic Scene Planning
```python
from core.scene_planner import plan_scenes
scenes = plan_scenes(
text_bn="ΰ¦ΰ¦ΰ¦ΰ§ΰ¦° দিনΰ¦ΰ¦Ώ ΰ¦ΰ§ΰ¦¬ ΰ¦Έΰ§ΰ¦¨ΰ§ΰ¦¦ΰ¦° ΰ¦ΰ¦Ώΰ¦²ΰ₯€",
duration=15
)
```
### Tier-Based Generation
```python
from config.model_tiers import get_tier_config
from models.image.sd_generator import get_generator
config = get_tier_config("pro")
generator = get_generator(lora_path=config.lora_path, use_lcm=config.lcm_enabled)
```
### Security Validation
```python
from config.model_tiers import validate_model_weights_security
result = validate_model_weights_security("data/lora/memo-scene-lora.safetensors")
```
## π Model Tiers
| Tier | Resolution | Inference Steps | LoRA | LCM | Credits/min | Memory |
|------|------------|-----------------|------|-----|-------------|--------|
| Free | 512Γ512 | 15 | β | β | $5.0 | 4GB |
| Pro | 768Γ768 | 25 | β
| β
| $15.0 | 8GB |
| Enterprise | 1024Γ1024 | 30 | β
| β
| $50.0 | 16GB |
## π οΈ Installation
```bash
# Clone the repository
git clone https://huggingface.co/likhonsheikh/memo
# Install dependencies
pip install -r requirements.txt
# Run the demonstration
python demo.py
# Start the API server
python api/main.py
```
## π¬ API Usage
### Health Check
```bash
curl http://localhost:8000/health
```
### Generate Video
```bash
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{
"text": "ΰ¦ΰ¦ΰ¦ΰ§ΰ¦° দিনΰ¦ΰ¦Ώ ΰ¦ΰ§ΰ¦¬ ΰ¦Έΰ§ΰ¦¨ΰ§ΰ¦¦ΰ¦° ΰ¦ΰ¦Ώΰ¦²ΰ₯€",
"duration": 15,
"tier": "pro"
}'
```
### Check Status
```bash
curl http://localhost:8000/status/{request_id}
```
## π§ͺ Training Custom LoRA
```python
from scripts.train_scene_lora import SceneLoRATrainer, TrainingConfig
config = TrainingConfig(
base_model="google/mt5-small",
rank=32,
alpha=64,
save_safetensors=True # MANDATORY
)
trainer = SceneLoRATrainer(config)
trainer.load_model()
trainer.setup_lora()
trainer.train(training_data)
```
## β‘ Performance Features
- **Memory Optimization**: xFormers, attention slicing, CPU offload
- **FP16 Precision**: 50% memory reduction with maintained quality
- **LCM Acceleration**: Faster inference when available
- **Device Mapping**: Optimal GPU/CPU utilization
- **Background Processing**: Async handling of long-running tasks
## π Security Validation
```python
from config.model_tiers import validate_model_weights_security
# Validate any model file
result = validate_model_weights_security("path/to/model.safetensors")
print(f"Secure: {result['is_secure']}")
print(f"Format: {result['format']}")
print(f"Issues: {result['issues']}")
```
## π File Structure
```
π Memo/
βββ π requirements.txt # Production dependencies
βββ π models/
β βββ π text/
β βββ π bangla_parser.py # Transformer-based Bangla parser
βββ π core/
β βββ π scene_planner.py # ML-based scene planning
βββ π models/
β βββ π image/
β βββ π sd_generator.py # Stable Diffusion + Safetensors
βββ π data/
β βββ π lora/
β βββ π README.md # LoRA configuration (safetensors only)
βββ π scripts/
β βββ π train_scene_lora.py # Training with safetensors output
βββ π config/
β βββ π model_tiers.py # Tier management system
βββ π api/
β βββ π main.py # Production API endpoint
βββ π demo.py # Complete system demonstration
```
## π― What This Doesn't Do
β Make GPUs cheap
β Fix bad prompts
β Read your mind
β Guarantee perfect results
## π Production Readiness
This implementation is now:
- β
**Correct** - Uses proper ML frameworks (transformers, safetensors)
- β
**Modern** - 2025-grade architecture with security best practices
- β
**Secure** - Zero tolerance for unsafe model formats
- β
**Scalable** - Tier-based resource management
- β
**Defensible** - Production-grade security and validation
## π License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## π Support
For support, email support@memo.ai or join our [Discord community](https://discord.gg/memo).
---
**If your API claims "state-of-the-art" without these features, you're lying.** Memo now actually delivers on that promise with proper Transformers + Safetensors integration. |