desuAnon
/

SoraVids

Model card Files Files and versions

xet

Community

desuAnon commited on Nov 28, 2024

Commit

92c295d

verified ·

1 Parent(s): 2d6682d

desu

Browse files

Files changed (1) hide show

README.md +24 -20

README.md CHANGED Viewed

@@ -1,41 +1,45 @@
 ---
 license: cc0-1.0
 ---
-On 2024-11-26, temporary access to OpenAI's video generation model Sora (turbo) was granted via this Hugging Face repository:
-https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora
-After a few hours, OpenAI invalidated the API key used by the repo and removed access to the generated videos.
-This release consists of 87 videos (~702 MB) and 83 corresponding prompts that were archived, from the publicly displayed generations, in anticipation of that event.
-Not all videos generated were able to be archived, due to HF load issues. All videos are of MIME type video/mp4 and have a framerate of 30 FPS.
-The generation parameters may be found in the app.py of the original repo [here](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora/blob/main/app.py); an archive of this file may be found [here](https://archive.is/r70Ao).
-The Sora backend that was used for generation was the following:
-`https://sora.openai.com/backend/video_gen`
-Please note that user prompts are often "augmented" (changed by some LLM) before generating videos, so the prompts listed may not be the exact one used by the model.
-The prompt used for four videos are not known, and these are denoted as [unknown_n].
 ---
 ### Archive versions
 **sora-turbo-vids.zip**
-This was the original upload, and had some encoding/compatibility issues for some users.
-The "short" video filenames are the full original prompts used for the API request for each video.
-A "long" prompt limit was based off the filename length limit (around 255 B) for Windows/macOS/Linux.
-All short prompts are used as filenames in the "short_prompts" directory.
-The ten longer prompts in "full_long_prompts.txt" were used for the videos in the "long_prompts" directory.
-**videos_only.zip** and **videos_only.7z**
 These identical archives (in different compression formats) contain only the original videos, with names such as `video_24.mp4`.
 The `video_24` part is the video ID, and the prompt used for a specific video ID is listed in the separate CSV and JSONL files (video_id, prompt).
-You should be able to easily view both those files in a text editor, and they are easy to import and process in various programming languages.
 ---
 ~ desuAnon
 https://rentry.org/desuAnon

 ---
 license: cc0-1.0
 ---
+### Release Information
+Temporary access to OpenAI's video generation model Sora (turbo) was provided by the HF repo [PR-Puppet-Sora](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora), on November 26th.
+After a few hours, OpenAI revoked the API key used by the repo and removed access to the generated videos.
+In anticipation of that event, the publicly displayed videos and their prompts were archived.
+This release contains 87 archived videos (~702 MB) and 83 of their prompts, and dedicated to the public domain (CC0 1.0 Universal).
+The generation parameters may be found in the app.py of the original repo [here](https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora/blob/main/app.py). An archive of this script is available [here](https://archive.is/r70Ao).
+User prompts are often "augmented" (changed by some LLM) before generating videos, and this may be true for these videos as well.
+The Sora backend that was used for generation was `https://sora.openai.com/backend/video_gen`
+Contrary to claims online, the generations were *not* uncensored. User prompts, as well as the generated videos, passed through OpenAI's content moderation normally.
+This is partly the reason why none of the videos in this archive are NSFW, or similar, despite a few *brave attempts* in the prompts.
+It is also incorrect that "Sora leaked", since the model itself (its model parameters) had not been acquired by outsiders.
+The only thing that "leaked" was previewer/beta tester access to Sora video generation, via a single HF repo - while keeping its API keys secret.
 ---
 ### Archive versions
+All videos are `.mp4`, of varying resolutions, and a framerate of 30 FPS.
+Not all of the videos that were generated were able to be archived, due to HF server load issues.
+The prompts used for four videos are not known, and these are denoted as [unknown_n].
+Hugging Face performs *File Security Scans* of uploaded files, and you can click on the icon next to each file to see the result of this.
 **sora-turbo-vids.zip**
+This is the original archive containing both videos and their prompts, and some users experienced encoding/compatibility issues with it.
+Consider using the more recent "separated" uploads if you encounter similar issues.
+The filenames in the `short_prompts` directory are the full prompts used for each video generation request.
+The filenames in the `long_prompts` directory are shortened versions of the long prompts (above 256 chars), and their full versions are found in `full_long_prompts.txt`.
+**videos_only.zip** & **videos_only.7z**
 These identical archives (in different compression formats) contain only the original videos, with names such as `video_24.mp4`.
 The `video_24` part is the video ID, and the prompt used for a specific video ID is listed in the separate CSV and JSONL files (video_id, prompt).
+You may easily view both those files in a text editor, and they are easy to import and process in various programming languages.
 ---
+Even though this is a *dataset* upload, I went with a *model* repo because a) the URL is shorter, and b) the original upload wasn't compatible with the HF dataset viewer.
 ~ desuAnon
 https://rentry.org/desuAnon