Update README.md
Browse files
README.md
CHANGED
|
@@ -8,8 +8,45 @@ metrics:
|
|
| 8 |
- wer
|
| 9 |
pipeline_tag: image-to-text
|
| 10 |
---
|
| 11 |
-
# Untitled7
|
| 12 |
|
| 13 |
-
This model was
|
| 14 |
|
| 15 |
-
It is supposed to read images and extract a stable diffusion prompt from it but, it might not do a good job at it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- wer
|
| 9 |
pipeline_tag: image-to-text
|
| 10 |
---
|
| 11 |
+
# Untitled7-colab_checkpoint
|
| 12 |
|
| 13 |
+
This model was lovingly named after the Google Colab notebook that made it. It is a finetune of Microsoft's [git-large-coco](https://huggingface.co/microsoft/git-large-coco) model on the 1k subset of [poloclub/diffusiondb](https://huggingface.co/datasets/poloclub/diffusiondb/viewer/2m_first_1k/train).
|
| 14 |
|
| 15 |
+
It is supposed to read images and extract a stable diffusion prompt from it but, it might not do a good job at it. I wouldn't know I haven't extensivly tested it.
|
| 16 |
+
|
| 17 |
+
As the title suggests this is a checkpoint as I formerly intended to do it on the entire dataset but, I'm unsure if I want to now...
|
| 18 |
+
## Intended use
|
| 19 |
+
|
| 20 |
+
Fun!
|
| 21 |
+
|
| 22 |
+
```python
|
| 23 |
+
# Load model directly
|
| 24 |
+
from transformers import AutoProcessor, AutoModelForCausalLM
|
| 25 |
+
|
| 26 |
+
processor = AutoProcessor.from_pretrained("SE6446/Untitled7-colab_checkpoint")
|
| 27 |
+
model = AutoModelForCausalLM.from_pretrained("SE6446/Untitled7-colab_checkpoint")
|
| 28 |
+
|
| 29 |
+
#################################################################
|
| 30 |
+
# Use a pipeline as a high-level helper
|
| 31 |
+
from transformers import pipeline
|
| 32 |
+
|
| 33 |
+
pipe = pipeline("image-to-text", model="SE6446/Untitled7-colab_checkpoint")
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Out-of-scope use
|
| 37 |
+
|
| 38 |
+
Don't use this model to discriminate, alienate or in any other way harm/harass individuals. You guys know the drill...
|
| 39 |
+
|
| 40 |
+
## Bias, Risks and, Limitations
|
| 41 |
+
|
| 42 |
+
This model does not produce accurate prompts, this is merely a bit of fun (and waste of funds). However it can suffer from bias present in the orginal git-large-coco model.
|
| 43 |
+
|
| 44 |
+
## Training
|
| 45 |
+
*I.e boring stuff*
|
| 46 |
+
|
| 47 |
+
- lr = 5e-5
|
| 48 |
+
- epochs = 150
|
| 49 |
+
- optim = adamw
|
| 50 |
+
- fp16
|
| 51 |
+
|
| 52 |
+
If you want to further finetune it then you should freeze the embedding and vision tranformer layers
|