|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- fastai |
|
|
- pytorch |
|
|
- image-classification |
|
|
- resnet |
|
|
license: openmdw-1.0 |
|
|
model_creator: Tom Hall |
|
|
model_version: "1.0" |
|
|
--- |
|
|
|
|
|
# Coherence Detection |
|
|
|
|
|
## Model Description |
|
|
A ResNet-34 fine-tuned on a personally-curated dataset to classify images into one of three categories: |
|
|
- Coherent |
|
|
- Incoherent |
|
|
- Semi-Incoherent |
|
|
|
|
|
**Key Feature**: This model is provided in safetensors format with a production-ready loading wrapper (`model_architecture.py`) that handles FastAI's `AdaptiveConcatPool2d` layer automatically. |
|
|
|
|
|
## Installation and Usage |
|
|
**Important**: This model is provided in safetensors format and requires the `model_architecture.py` module for proper loading. |
|
|
|
|
|
1. **Install** |
|
|
```bash |
|
|
# First install PyTorch with the correct CUDA version for your system |
|
|
# Visit https://pytorch.org/get-started/locally/ for the right command |
|
|
|
|
|
# Example for CUDA 11.8: |
|
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 |
|
|
|
|
|
# Example for CPU-only: |
|
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu |
|
|
|
|
|
# Once torch is installed and verified: |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
2. **Download entire repo and run the example**: |
|
|
```bash |
|
|
python example_usage.py |
|
|
``` |
|
|
|
|
|
3. **Use in your code**: |
|
|
```python |
|
|
from example_usage import CoherenceClassifier |
|
|
# Initialize with your model |
|
|
classifier = CoherenceClassifier("coherence_model.safetensors") |
|
|
# Predict on an image |
|
|
result = classifier.predict("your_image.jpg", return_probs=True) |
|
|
print(result) # {'coherent': 0.85, 'incoherent': 0.05, 'semi-incoherent': 0.10} |
|
|
``` |
|
|
|
|
|
**Note**: The model uses FastAI's `AdaptiveConcatPool2d` layer. Import and use `model_architecture.py` which handles this automatically. The `example_usage.py` script demonstrates the proper import pattern. |
|
|
|
|
|
## Model Architecture |
|
|
Backbone: ResNet-34 (via FastAI's default) |
|
|
Pooling: AdaptiveConcatPool2d (FastAI-specific) |
|
|
Input size: 224x224 (standard ImageNet normalization) |
|
|
|
|
|
## Training Data |
|
|
For version 1.0, a small dataset was used of ~20k images in coherent category, and ~ 12k images in both incoherent and semi-incoherent categories. Access to dataset will not be made available. An attempt was made to include a wide range of content in the coherent category to avoid false positive incoherency results. |
|
|
|
|
|
## Limitations |
|
|
- PyTorch environment |
|
|
- Requires `model_architecture.py` module for proper loading (handles FastAI-specific layers automatically) |
|
|
|
|
|
Note especially that while an attempt was made at classifying the less obvious but definitely noticeable generation failures like the following into the "semi-incoherent" category, these are much harder to detect and in the current iteration of this model there is no expectation that they will be observed in any particular coherence category, especially when these particular errors occur in a small part of the image. |
|
|
- Extra or missing limbs, fingers, or facial features |
|
|
- Disproportionate body, head, or limbs |
|
|
- Anatomically implausible joint configurations |
|
|
|
|
|
## Intended Use |
|
|
*This model is intended **only** for evaluating the coherence of AI-generated images.* |
|
|
- **Do not use it to classify or moderate real photographs**, as it may produce nonsensical and harmful misclassifications. |
|
|
- A "semi-incoherent" or "incoherent" result should be a flag for human review, not necessarily an automatic basis for censorship. |
|
|
|
|
|
## Comment |
|
|
Given the low coherence rate of results produced by early image generation models, it was very surprising that a model was not found for this purpose, necessitating the creation of this one for high-volume review scenarios. |
|
|
|
|
|
Perhaps models such as this one are avoided or seen as improper due to the perceived danger they pose in introduction of bias to image analysis, however it is highly likely that image generators would rather have at least some bias towards coherence and a somewhat clear mind when reviewing their image output than no bias and a mind littered with the psychologically-damaging results of obviously-failed generations which have little to do with the prompter's intent. |
|
|
|
|
|
## Model Card Authors |
|
|
Tom Hall |
|
|
|
|
|
## Model Card Contact |
|
|
tomhall.main@gmail.com |
|
|
|
|
|
## Model Card Version |
|
|
**Version:** 1.0 | [See all versions](https://huggingface.co/your-username/your-model-name/tree/main) |
|
|
|