---
license: mit
library_name: transformers
tags:
- florence-2
- deepfake-detection
- computer-vision
- multimodal
- lora
pipeline_tag: image-to-text
---

# Verity-1A: Florence-2 + FLODA Deepfake Detection Model

## 🎯 Model Description

**Verity-1A** is an advanced multimodal model combining Microsoft's Florence-2-base with the FLODA-deepfake LoRA adapter for enhanced AI-generated content detection. This fusion creates a specialized model optimized for identifying deepfakes and AI-generated images while maintaining Florence-2's powerful vision-language capabilities.

## 🏗️ Model Architecture

- **Base Model**: Microsoft Florence-2-base (768d architecture)
- **Enhancement**: FLODA-deepfake LoRA adapter
- **Model Size**: ~447 MB
- **Optimization**: PEFT-based fusion for efficient inference

## 🚀 Key Features

- ✅ **Deepfake Detection**: Specialized for AI-generated content identification
- ✅ **Multimodal**: Combines vision and language understanding
- ✅ **Compact**: 6.7x smaller than Florence-2-large
- ✅ **Production-Ready**: Fully validated and optimized

## 📊 Performance

- **Architecture**: 768-dimensional embeddings
- **Parameters**: ~232M parameters
- **Inference**: Optimized for real-time detection
- **Compatibility**: Full Transformers ecosystem support

## 🛠️ Usage

```python
from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "zelus82/verity-1A",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Load processor
processor = AutoProcessor.from_pretrained(
    "zelus82/verity-1A",
    trust_remote_code=True
)

# Example usage for deepfake detection
def detect_deepfake(image, text_prompt="Is this image AI-generated?"):
    inputs = processor(text=text_prompt, images=image, return_tensors="pt")
    
    with torch.no_grad():
        generated_ids = model.generate(
            input_ids=inputs["input_ids"],
            pixel_values=inputs["pixel_values"],
            max_new_tokens=1024,
            num_beams=3
        )
    
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    return generated_text
```

## 🎓 Training Details

- **Base Training**: Microsoft Florence-2-base foundation
- **Specialization**: FLODA-deepfake LoRA fine-tuning
- **Fusion Method**: PEFT merge_and_unload for optimal performance
- **Validation**: Comprehensive 666-tensor validation passed

## 📋 Model Card

| Attribute | Value |
|-----------|-------|
| Model Type | Multimodal Vision-Language |
| Base Architecture | Florence-2 |
| Specialization | Deepfake Detection |
| Model Size | 447 MB |
| Parameters | ~232M |
| Precision | Float16 |
| License | MIT |

## 🔧 Technical Specifications

- **Hidden Size**: 768
- **Vocabulary Size**: 51,289
- **Vision Encoder**: Advanced transformer-based
- **Language Model**: Optimized for detection tasks
- **LoRA Rank**: 8 (optimal efficiency/performance)

## ⚠️ Limitations

- Optimized specifically for deepfake detection tasks
- Based on Florence-2-base architecture (768d)
- Not compatible with Florence-2-large components
- Requires trust_remote_code=True for full functionality

## 📄 Citation

```bibtex
@model{verity1a2024,
  title={Verity-1A: Florence-2 Enhanced Deepfake Detection},
  author={zelus82},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/zelus82/verity-1A}
}
```

## 🤝 Acknowledgments

- **Microsoft** for the Florence-2 foundation model
- **FLODA** team for the deepfake detection adapter
- **Hugging Face** for the ecosystem and hosting

## 📞 Contact

For questions or collaborations, please reach out through the Hugging Face community discussions.

---

*Built with ❤️ for safer AI content detection*