obake2ai
/

MaryGPT

@@ -3,6 +3,8 @@ license: apache-2.0
 tags:
 - gpt-j
 - llm
 ---
 # MaryGPT Model Card
@@ -32,47 +34,20 @@ All data was obtained ethically and in compliance with the site's terms and cond
 No copyright images are used in the training of this model without the permission.
 No AI generated images are in the dataset.
-- The Metropolitan Museum of Art Open Access (CC0 / Public domain)
-- Smithsonian Museum Open Access (CC0 / Public domain)
-- Cleveland Museum of Art Open Access (CC0 / Public domain)
-- National Gallery of Art Open Access (CC0 / Public domain)
-- The Art Institute of Chicago Open Access (CC0 / Public domain)
-- The Walters Art Museum Open Access (CC0 / Public domain)
-- J. Paul Getty Museum Open Access (CC0 / Public domain)
-- ArtBench-10 (public domain subset)
-- Flickr (CC0 subset)
-- Wikimedia Commons (CC0 subset)
-- NFT arts *1 (goblintown.nft, mfer, tubby-cats, Timeless) (CC0)
-- Full version of [VRoid Image Dataset](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) (CC0 or licensed)
-- Open Clipart (Public domain)
-- Open Duelyst (CC0)
-- 3dicons (CC0)
-- ambientCG (CC0)
-- Wuffle comics made by Piti Yindee (CC0)
-- 大崎一番太郎 made by 大崎駅西口商店会 (CC0)
-- Traditional Generative Art (Non-AI) and Visual Artworks made by Rhizomatiks (licensed)
-Approx 11M images in total with data augmentation.
-1. Their work is released under a CC0 license, but if you are considering using this model to create a work inspired by their NFT and sell it as NFT, please consider paying them a royalty to help the CC0 NFT community grow.
-## Training Notes
-- Trained resolution : 256x256 --> 512x512 --> (512x512, 640x448, 448x640) --> (512x512, 768x512, 512x768)
-- diffusers version and `mitsua-diffusion-one.ckpt` are fine-tuned with [Diffusion With Offset Noise](https://www.crosslabs.org/blog/diffusion-with-offset-noise) technique which is applied to last 12k steps with `p=0.02`.
-- `mitsua-diffusion-one-base.ckpt` is non-fine-tuned version. For fine-tuning stuff, this version would be better choice.
-## Cosine similarity (as a proof of full-scratch training)
-- VAE
-  - 0.16694325 (vs Stable Diffusion v2.1 base)
-  - 0.20887965 (vs Stable Diffusion v.1.4)
-  - All fine-tuned variant would have over 0.90
-- U-Net
-  - 0.07097270 (vs Stable Diffusion v2.1 base)
-  - 0.08351029 (vs Stable Diffusion v.1.4)
-  - All fine-tuned variant would have over 0.99
 ## Developed by
-- Latent Diffusion Models (for algorithm and training scripts, MIT License) : Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer
-- OpenCLIP : Ilharco Gabriel, Wortsman Mitchell, Wightman Ross, Gordon Cade, Carlini Nicholas, Taori Rohan, Dave Achal, Shankar Vaishaal, Namkoong Hongseok, Miller John, Hajishirzi Hannaneh, Farhadi Ali, Schmidt Ludwig
-- Mitsua Diffusion One : Abstract Engine
-- Special Thanks to Mitsua Contributors

 tags:
 - gpt-j
 - llm
+datasets:
+- EleutherAI/pile
 ---
 # MaryGPT Model Card
 No copyright images are used in the training of this model without the permission.
 No AI generated images are in the dataset.
+- GPT-J 6B was trained on [the Pile](https://pile.eleuther.ai), a large-scale curated dataset created by [EleutherAI](https://www.eleuther.ai).
+- Frankenstein; or, The Modern Prometheus, 1818 (Public domain)
+## Training procedure
+This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
 ## Developed by
+MaryGPT
+- [Yuma Kishi](https://x.com/obake_ai)
+GPT-J
+- [James Bradbury](https://twitter.com/jekbradbury) for valuable assistance with debugging JAX issues.
+- [Stella Biderman](https://www.stellabiderman.com), [Eric Hallahan](https://twitter.com/erichallahan), [Kurumuz](https://github.com/kurumuz/), and [Finetune](https://github.com/finetuneanon/) for converting the model to be compatible with the `transformers` package.
+- [Leo Gao](https://twitter.com/nabla_theta) for running zero shot evaluations for the baseline models for the table.
+- [Laurence Golding](https://github.com/researcher2/) for adding some features to the web demo.
+- [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) for advice with experiment design and writing the blog posts.
+- [Janko Prester](https://github.com/jprester/) for creating the web demo frontend.