Upload folder using huggingface_hub

9b737ee verified 7 months ago

3.8 kB

	---
	license: mit
	library_name: transformers
	tags:
	- florence-2
	- deepfake-detection
	- computer-vision
	- multimodal
	- lora
	pipeline_tag: image-to-text
	---

	# Verity-1A: Florence-2 + FLODA Deepfake Detection Model

	## 🎯 Model Description

	Verity-1A is an advanced multimodal model combining Microsoft's Florence-2-base with the FLODA-deepfake LoRA adapter for enhanced AI-generated content detection. This fusion creates a specialized model optimized for identifying deepfakes and AI-generated images while maintaining Florence-2's powerful vision-language capabilities.

	## 🏗️ Model Architecture

	- Base Model: Microsoft Florence-2-base (768d architecture)
	- Enhancement: FLODA-deepfake LoRA adapter
	- Model Size: ~447 MB
	- Optimization: PEFT-based fusion for efficient inference

	## 🚀 Key Features

	- ✅ Deepfake Detection: Specialized for AI-generated content identification
	- ✅ Multimodal: Combines vision and language understanding
	- ✅ Compact: 6.7x smaller than Florence-2-large
	- ✅ Production-Ready: Fully validated and optimized

	## 📊 Performance

	- Architecture: 768-dimensional embeddings
	- Parameters: ~232M parameters
	- Inference: Optimized for real-time detection
	- Compatibility: Full Transformers ecosystem support

	## 🛠️ Usage

	```python
	from transformers import AutoModelForCausalLM, AutoProcessor
	import torch

	# Load model
	model = AutoModelForCausalLM.from_pretrained(
	"zelus82/verity-1A",
	torch_dtype=torch.float16,
	trust_remote_code=True
	)

	# Load processor
	processor = AutoProcessor.from_pretrained(
	"zelus82/verity-1A",
	trust_remote_code=True
	)

	# Example usage for deepfake detection
	def detect_deepfake(image, text_prompt="Is this image AI-generated?"):
	inputs = processor(text=text_prompt, images=image, return_tensors="pt")

	with torch.no_grad():
	generated_ids = model.generate(
	input_ids=inputs["input_ids"],
	pixel_values=inputs["pixel_values"],
	max_new_tokens=1024,
	num_beams=3
	)

	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
	return generated_text
	```

	## 🎓 Training Details

	- Base Training: Microsoft Florence-2-base foundation
	- Specialization: FLODA-deepfake LoRA fine-tuning
	- Fusion Method: PEFT merge_and_unload for optimal performance
	- Validation: Comprehensive 666-tensor validation passed

	## 📋 Model Card

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Model Type \| Multimodal Vision-Language \|
	\| Base Architecture \| Florence-2 \|
	\| Specialization \| Deepfake Detection \|
	\| Model Size \| 447 MB \|
	\| Parameters \| ~232M \|
	\| Precision \| Float16 \|
	\| License \| MIT \|

	## 🔧 Technical Specifications

	- Hidden Size: 768
	- Vocabulary Size: 51,289
	- Vision Encoder: Advanced transformer-based
	- Language Model: Optimized for detection tasks
	- LoRA Rank: 8 (optimal efficiency/performance)

	## ⚠️ Limitations

	- Optimized specifically for deepfake detection tasks
	- Based on Florence-2-base architecture (768d)
	- Not compatible with Florence-2-large components
	- Requires trust_remote_code=True for full functionality

	## 📄 Citation

	```bibtex
	@model{verity1a2024,
	title={Verity-1A: Florence-2 Enhanced Deepfake Detection},
	author={zelus82},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/zelus82/verity-1A}
	}
	```

	## 🤝 Acknowledgments

	- Microsoft for the Florence-2 foundation model
	- FLODA team for the deepfake detection adapter
	- Hugging Face for the ecosystem and hosting

	## 📞 Contact

	For questions or collaborations, please reach out through the Hugging Face community discussions.

	---

	Built with ❤️ for safer AI content detection