Update README.md
Browse files
README.md
CHANGED
|
@@ -3,6 +3,8 @@ license: apache-2.0
|
|
| 3 |
tags:
|
| 4 |
- gpt-j
|
| 5 |
- llm
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
# MaryGPT Model Card
|
| 8 |
|
|
@@ -32,47 +34,20 @@ All data was obtained ethically and in compliance with the site's terms and cond
|
|
| 32 |
No copyright images are used in the training of this model without the permission.
|
| 33 |
No AI generated images are in the dataset.
|
| 34 |
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
- Cleveland Museum of Art Open Access (CC0 / Public domain)
|
| 38 |
-
- National Gallery of Art Open Access (CC0 / Public domain)
|
| 39 |
-
- The Art Institute of Chicago Open Access (CC0 / Public domain)
|
| 40 |
-
- The Walters Art Museum Open Access (CC0 / Public domain)
|
| 41 |
-
- J. Paul Getty Museum Open Access (CC0 / Public domain)
|
| 42 |
-
- ArtBench-10 (public domain subset)
|
| 43 |
-
- Flickr (CC0 subset)
|
| 44 |
-
- Wikimedia Commons (CC0 subset)
|
| 45 |
-
- NFT arts *1 (goblintown.nft, mfer, tubby-cats, Timeless) (CC0)
|
| 46 |
-
- Full version of [VRoid Image Dataset](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) (CC0 or licensed)
|
| 47 |
-
- Open Clipart (Public domain)
|
| 48 |
-
- Open Duelyst (CC0)
|
| 49 |
-
- 3dicons (CC0)
|
| 50 |
-
- ambientCG (CC0)
|
| 51 |
-
- Wuffle comics made by Piti Yindee (CC0)
|
| 52 |
-
- 大崎一番太郎 made by 大崎駅西口商店会 (CC0)
|
| 53 |
-
- Traditional Generative Art (Non-AI) and Visual Artworks made by Rhizomatiks (licensed)
|
| 54 |
|
| 55 |
-
|
|
|
|
| 56 |
|
| 57 |
-
1. Their work is released under a CC0 license, but if you are considering using this model to create a work inspired by their NFT and sell it as NFT, please consider paying them a royalty to help the CC0 NFT community grow.
|
| 58 |
-
|
| 59 |
-
## Training Notes
|
| 60 |
-
- Trained resolution : 256x256 --> 512x512 --> (512x512, 640x448, 448x640) --> (512x512, 768x512, 512x768)
|
| 61 |
-
- diffusers version and `mitsua-diffusion-one.ckpt` are fine-tuned with [Diffusion With Offset Noise](https://www.crosslabs.org/blog/diffusion-with-offset-noise) technique which is applied to last 12k steps with `p=0.02`.
|
| 62 |
-
- `mitsua-diffusion-one-base.ckpt` is non-fine-tuned version. For fine-tuning stuff, this version would be better choice.
|
| 63 |
-
|
| 64 |
-
## Cosine similarity (as a proof of full-scratch training)
|
| 65 |
-
- VAE
|
| 66 |
-
- 0.16694325 (vs Stable Diffusion v2.1 base)
|
| 67 |
-
- 0.20887965 (vs Stable Diffusion v.1.4)
|
| 68 |
-
- All fine-tuned variant would have over 0.90
|
| 69 |
-
- U-Net
|
| 70 |
-
- 0.07097270 (vs Stable Diffusion v2.1 base)
|
| 71 |
-
- 0.08351029 (vs Stable Diffusion v.1.4)
|
| 72 |
-
- All fine-tuned variant would have over 0.99
|
| 73 |
-
|
| 74 |
## Developed by
|
| 75 |
-
|
| 76 |
-
-
|
| 77 |
-
|
| 78 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- gpt-j
|
| 5 |
- llm
|
| 6 |
+
datasets:
|
| 7 |
+
- EleutherAI/pile
|
| 8 |
---
|
| 9 |
# MaryGPT Model Card
|
| 10 |
|
|
|
|
| 34 |
No copyright images are used in the training of this model without the permission.
|
| 35 |
No AI generated images are in the dataset.
|
| 36 |
|
| 37 |
+
- GPT-J 6B was trained on [the Pile](https://pile.eleuther.ai), a large-scale curated dataset created by [EleutherAI](https://www.eleuther.ai).
|
| 38 |
+
- Frankenstein; or, The Modern Prometheus, 1818 (Public domain)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
## Training procedure
|
| 41 |
+
This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## Developed by
|
| 44 |
+
MaryGPT
|
| 45 |
+
- [Yuma Kishi](https://x.com/obake_ai)
|
| 46 |
+
|
| 47 |
+
GPT-J
|
| 48 |
+
- [James Bradbury](https://twitter.com/jekbradbury) for valuable assistance with debugging JAX issues.
|
| 49 |
+
- [Stella Biderman](https://www.stellabiderman.com), [Eric Hallahan](https://twitter.com/erichallahan), [Kurumuz](https://github.com/kurumuz/), and [Finetune](https://github.com/finetuneanon/) for converting the model to be compatible with the `transformers` package.
|
| 50 |
+
- [Leo Gao](https://twitter.com/nabla_theta) for running zero shot evaluations for the baseline models for the table.
|
| 51 |
+
- [Laurence Golding](https://github.com/researcher2/) for adding some features to the web demo.
|
| 52 |
+
- [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) for advice with experiment design and writing the blog posts.
|
| 53 |
+
- [Janko Prester](https://github.com/jprester/) for creating the web demo frontend.
|