Tianyi1229 commited on
Commit
df2357f
·
verified ·
1 Parent(s): 26769b5

Upload 122 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +70 -0
  2. Tune-A-Video/outputs/10_classes_500_epoch/config.yaml +46 -0
  3. Tune-A-Video/outputs/10_classes_500_epoch/model_index.json +24 -0
  4. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100.gif +3 -0
  5. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/a cat is sleeping on the sofa.gif +3 -0
  6. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/a person is dancing.gif +3 -0
  7. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/an airplane in the sky.gif +3 -0
  8. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200.gif +3 -0
  9. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/a cat is sleeping on the sofa.gif +3 -0
  10. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/a person is dancing.gif +3 -0
  11. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/an airplane in the sky.gif +3 -0
  12. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300.gif +3 -0
  13. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/a cat is sleeping on the sofa.gif +3 -0
  14. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/a person is dancing.gif +3 -0
  15. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/an airplane in the sky.gif +3 -0
  16. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400.gif +3 -0
  17. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/a cat is sleeping on the sofa.gif +3 -0
  18. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/a person is dancing.gif +3 -0
  19. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/an airplane in the sky.gif +3 -0
  20. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500.gif +3 -0
  21. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/a cat is sleeping on the sofa.gif +3 -0
  22. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/a person is dancing.gif +3 -0
  23. Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/an airplane in the sky.gif +3 -0
  24. Tune-A-Video/outputs/10_classes_500_epoch/scheduler/scheduler_config.json +14 -0
  25. Tune-A-Video/outputs/10_classes_500_epoch/text_encoder/config.json +25 -0
  26. Tune-A-Video/outputs/10_classes_500_epoch/text_encoder/pytorch_model.bin +3 -0
  27. Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/merges.txt +0 -0
  28. Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/special_tokens_map.json +24 -0
  29. Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/tokenizer_config.json +30 -0
  30. Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/vocab.json +0 -0
  31. Tune-A-Video/outputs/10_classes_500_epoch/unet/config.json +44 -0
  32. Tune-A-Video/outputs/10_classes_500_epoch/unet/diffusion_pytorch_model.bin +3 -0
  33. Tune-A-Video/outputs/10_classes_500_epoch/vae/config.json +31 -0
  34. Tune-A-Video/outputs/10_classes_500_epoch/vae/diffusion_pytorch_model.bin +3 -0
  35. Tune-A-Video/outputs/20_classes_500_epoch/config.yaml +46 -0
  36. Tune-A-Video/outputs/20_classes_500_epoch/model_index.json +24 -0
  37. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100.gif +3 -0
  38. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/a cat is sleeping on the sofa.gif +3 -0
  39. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/a person is dancing.gif +3 -0
  40. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/an airplane in the sky.gif +3 -0
  41. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200.gif +3 -0
  42. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/a cat is sleeping on the sofa.gif +3 -0
  43. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/a person is dancing.gif +3 -0
  44. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/an airplane in the sky.gif +3 -0
  45. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300.gif +3 -0
  46. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/a cat is sleeping on the sofa.gif +3 -0
  47. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/a person is dancing.gif +3 -0
  48. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/an airplane in the sky.gif +3 -0
  49. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400.gif +3 -0
  50. Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400/a cat is sleeping on the sofa.gif +3 -0
.gitattributes CHANGED
@@ -33,3 +33,73 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100.gif filter=lfs diff=lfs merge=lfs -text
37
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
38
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
39
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
40
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200.gif filter=lfs diff=lfs merge=lfs -text
41
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
42
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
43
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
44
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300.gif filter=lfs diff=lfs merge=lfs -text
45
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
46
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
47
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
48
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400.gif filter=lfs diff=lfs merge=lfs -text
49
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
50
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
51
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
52
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500.gif filter=lfs diff=lfs merge=lfs -text
53
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
54
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
55
+ Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
56
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100.gif filter=lfs diff=lfs merge=lfs -text
57
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
58
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
59
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
60
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200.gif filter=lfs diff=lfs merge=lfs -text
61
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
62
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
63
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
64
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300.gif filter=lfs diff=lfs merge=lfs -text
65
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
66
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
67
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
68
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400.gif filter=lfs diff=lfs merge=lfs -text
69
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
70
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
71
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
72
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-500.gif filter=lfs diff=lfs merge=lfs -text
73
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-500/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
74
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-500/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
75
+ Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-500/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
76
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-100.gif filter=lfs diff=lfs merge=lfs -text
77
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-100/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
78
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-100/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
79
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-100/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
80
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-200.gif filter=lfs diff=lfs merge=lfs -text
81
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-200/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
82
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-200/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
83
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-200/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
84
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-300.gif filter=lfs diff=lfs merge=lfs -text
85
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-300/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
86
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-300/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
87
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-300/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
88
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-400.gif filter=lfs diff=lfs merge=lfs -text
89
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-400/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
90
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-400/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
91
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-400/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
92
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-500.gif filter=lfs diff=lfs merge=lfs -text
93
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-500/a[[:space:]]cat[[:space:]]is[[:space:]]sleeping[[:space:]]on[[:space:]]the[[:space:]]sofa.gif filter=lfs diff=lfs merge=lfs -text
94
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-500/a[[:space:]]person[[:space:]]is[[:space:]]dancing.gif filter=lfs diff=lfs merge=lfs -text
95
+ Tune-A-Video/outputs/30_classes_500_epoch/samples/sample-500/an[[:space:]]airplane[[:space:]]in[[:space:]]the[[:space:]]sky.gif filter=lfs diff=lfs merge=lfs -text
96
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-100.gif filter=lfs diff=lfs merge=lfs -text
97
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-100/a[[:space:]]long[[:space:]]road[[:space:]]with[[:space:]]a[[:space:]]sunset[[:space:]]in[[:space:]]the[[:space:]]background.gif filter=lfs diff=lfs merge=lfs -text
98
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-100/a[[:space:]]man[[:space:]]playing[[:space:]]a[[:space:]]guitar.gif filter=lfs diff=lfs merge=lfs -text
99
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-100/a[[:space:]]person[[:space:]]is[[:space:]]playing[[:space:]]drums.gif filter=lfs diff=lfs merge=lfs -text
100
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-100/a[[:space:]]watermelon.gif filter=lfs diff=lfs merge=lfs -text
101
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-200.gif filter=lfs diff=lfs merge=lfs -text
102
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-200/a[[:space:]]long[[:space:]]road[[:space:]]with[[:space:]]a[[:space:]]sunset[[:space:]]in[[:space:]]the[[:space:]]background.gif filter=lfs diff=lfs merge=lfs -text
103
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-200/a[[:space:]]man[[:space:]]playing[[:space:]]a[[:space:]]guitar.gif filter=lfs diff=lfs merge=lfs -text
104
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-200/a[[:space:]]person[[:space:]]is[[:space:]]playing[[:space:]]drums.gif filter=lfs diff=lfs merge=lfs -text
105
+ Tune-A-Video/outputs/40_classes_200_epoch/samples/sample-200/a[[:space:]]watermelon.gif filter=lfs diff=lfs merge=lfs -text
Tune-A-Video/outputs/10_classes_500_epoch/config.yaml ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pretrained_model_path: /userhome/zhoutianyi/Zhoutianyi/Mutilmodel/huggingface/stable-diffusion-v1-4
2
+ output_dir: ./outputs/10_classes_500_epoch/
3
+ train_data:
4
+ video_path: /userhome/zhoutianyi/Zhoutianyi/dataset/SEED-DV/output/block1/163.mp4
5
+ prompt: a turtle swimming in the water
6
+ n_sample_frames: 6
7
+ width: 512
8
+ height: 288
9
+ sample_start_idx: 0
10
+ sample_frame_rate: 2
11
+ validation_data:
12
+ prompts:
13
+ - a cat is sleeping on the sofa
14
+ - an airplane in the sky
15
+ - a person is dancing
16
+ video_length: 6
17
+ width: 512
18
+ height: 288
19
+ num_inference_steps: 50
20
+ guidance_scale: 12.5
21
+ use_inv_latent: false
22
+ num_inv_steps: 50
23
+ validation_steps: 100
24
+ trainable_modules:
25
+ - attn1.to_q
26
+ - attn2.to_q
27
+ - attn_temp
28
+ train_batch_size: 10
29
+ max_train_steps: 500
30
+ learning_rate: 3.0e-05
31
+ scale_lr: false
32
+ lr_scheduler: constant
33
+ lr_warmup_steps: 0
34
+ adam_beta1: 0.9
35
+ adam_beta2: 0.999
36
+ adam_weight_decay: 0.01
37
+ adam_epsilon: 1.0e-08
38
+ max_grad_norm: 1.0
39
+ gradient_accumulation_steps: 1
40
+ gradient_checkpointing: true
41
+ checkpointing_steps: 1000
42
+ resume_from_checkpoint: null
43
+ mixed_precision: fp16
44
+ use_8bit_adam: false
45
+ enable_xformers_memory_efficient_attention: true
46
+ seed: 33
Tune-A-Video/outputs/10_classes_500_epoch/model_index.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "TuneAVideoPipeline",
3
+ "_diffusers_version": "0.11.1",
4
+ "scheduler": [
5
+ "diffusers",
6
+ "PNDMScheduler"
7
+ ],
8
+ "text_encoder": [
9
+ "transformers",
10
+ "CLIPTextModel"
11
+ ],
12
+ "tokenizer": [
13
+ "transformers",
14
+ "CLIPTokenizer"
15
+ ],
16
+ "unet": [
17
+ null,
18
+ "UNet3DConditionModel"
19
+ ],
20
+ "vae": [
21
+ "diffusers",
22
+ "AutoencoderKL"
23
+ ]
24
+ }
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100.gif ADDED

Git LFS Details

  • SHA256: 4850f634bf4012dae943266237cee5928aa2aac1d9333881da813d843de2288f
  • Pointer size: 132 Bytes
  • Size of remote file: 1.22 MB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: 3364cf566caa1f9ca82c5fb5a4ea9890d81a6cac90a7ee4b2978afd1bcca103e
  • Pointer size: 131 Bytes
  • Size of remote file: 423 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: 82ede2b4a96fac716b20c5bebc9f3baf009a5f53c37c8097bb7a1eb0aea61f7e
  • Pointer size: 131 Bytes
  • Size of remote file: 573 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-100/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: 6ab5b38ad789e899d2afe423664f4f1f64b0418d22926e0b7fefad366649f318
  • Pointer size: 131 Bytes
  • Size of remote file: 472 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200.gif ADDED

Git LFS Details

  • SHA256: e6546be4f3ad02e236578f91ab56b93b1b27812ba425251fc20d1d5614f71540
  • Pointer size: 132 Bytes
  • Size of remote file: 1.25 MB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: 2139e411d1fae34b4db1c079f3c7ae70a4340328065efbd83fe0deaddeee41c4
  • Pointer size: 131 Bytes
  • Size of remote file: 452 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: a1bb0bb91425360b9aed829f88bfc52a849a4678973bf6a5eba4bed258346f28
  • Pointer size: 131 Bytes
  • Size of remote file: 611 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-200/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: 52e2e1c0fd5178b27f59ecf35e64168b9b0cc36d2f853b0e99c973f51819c205
  • Pointer size: 131 Bytes
  • Size of remote file: 437 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300.gif ADDED

Git LFS Details

  • SHA256: 2b0cb510d1f6dd971a319b532e0d55e18ef2625f334d5519c9c6a251a1dfd521
  • Pointer size: 132 Bytes
  • Size of remote file: 1.2 MB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: bbb0ac859096ae0745501f61548ee364fb7f0cebc0abfa3f629e2f92678b1c9c
  • Pointer size: 131 Bytes
  • Size of remote file: 481 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: 9a8dbb7968e0266e9227ef64f588854785f9fd3f6ec73d52ac5498cd71d7d2b1
  • Pointer size: 131 Bytes
  • Size of remote file: 537 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-300/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: 95f131d4960d3c469532fdd08224a17118e651f41666cb3f8b1b63ad41d90621
  • Pointer size: 131 Bytes
  • Size of remote file: 386 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400.gif ADDED

Git LFS Details

  • SHA256: d16b1589a5141dada0713e2910145f073c8bef153a942fce35961b3f0002049f
  • Pointer size: 132 Bytes
  • Size of remote file: 1.2 MB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: ba528e76c08614c6fbd998de3e01aa7fe404e7b8910b0f8500cb1f6d3eaf8240
  • Pointer size: 131 Bytes
  • Size of remote file: 493 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: 5651b397a674e43c59ccd62c90ff2f3629ba92d299729192cec8769fd652e801
  • Pointer size: 131 Bytes
  • Size of remote file: 525 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-400/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: 297f9958662a9b2e388017652d2e428c5e53d12dd2527f9a8424c58556a8f3c3
  • Pointer size: 131 Bytes
  • Size of remote file: 339 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500.gif ADDED

Git LFS Details

  • SHA256: dd8d43e3ebb901f6de3a8b60aeb6f020004aef6c8867ef3e247ad3f4e365981b
  • Pointer size: 132 Bytes
  • Size of remote file: 1.33 MB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: b0bbd3125ad01ab596d9339ccd60b37fa490033e03231b360d8cf739f93ea22f
  • Pointer size: 131 Bytes
  • Size of remote file: 556 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: f9b33dec05295ece69cfcb7d1505897be6cf0518ca1a341ebd4c67cebfc04f38
  • Pointer size: 131 Bytes
  • Size of remote file: 606 kB
Tune-A-Video/outputs/10_classes_500_epoch/samples/sample-500/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: f9fb89c101c54efc7f3f0f00bfecf9be4d0b473302a596cf2873a6ed72dc75ac
  • Pointer size: 131 Bytes
  • Size of remote file: 338 kB
Tune-A-Video/outputs/10_classes_500_epoch/scheduler/scheduler_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PNDMScheduler",
3
+ "_diffusers_version": "0.11.1",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "num_train_timesteps": 1000,
9
+ "prediction_type": "epsilon",
10
+ "set_alpha_to_one": false,
11
+ "skip_prk_steps": true,
12
+ "steps_offset": 1,
13
+ "trained_betas": null
14
+ }
Tune-A-Video/outputs/10_classes_500_epoch/text_encoder/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/userhome/zhoutianyi/Zhoutianyi/Mutilmodel/huggingface/stable-diffusion-v1-4",
3
+ "architectures": [
4
+ "CLIPTextModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dropout": 0.0,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "quick_gelu",
11
+ "hidden_size": 768,
12
+ "initializer_factor": 1.0,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 77,
17
+ "model_type": "clip_text_model",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 1,
21
+ "projection_dim": 512,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.44.2",
24
+ "vocab_size": 49408
25
+ }
Tune-A-Video/outputs/10_classes_500_epoch/text_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5298945b1f3150e6debdcf55e32cd3c817a8d7efa6ded70b22dd9c1c0c68998f
3
+ size 246188314
Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "model_max_length": 77,
27
+ "pad_token": "<|endoftext|>",
28
+ "tokenizer_class": "CLIPTokenizer",
29
+ "unk_token": "<|endoftext|>"
30
+ }
Tune-A-Video/outputs/10_classes_500_epoch/tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Tune-A-Video/outputs/10_classes_500_epoch/unet/config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet3DConditionModel",
3
+ "_diffusers_version": "0.11.1",
4
+ "act_fn": "silu",
5
+ "attention_head_dim": 8,
6
+ "block_out_channels": [
7
+ 320,
8
+ 640,
9
+ 1280,
10
+ 1280
11
+ ],
12
+ "center_input_sample": false,
13
+ "class_embed_type": null,
14
+ "cross_attention_dim": 768,
15
+ "down_block_types": [
16
+ "CrossAttnDownBlock3D",
17
+ "CrossAttnDownBlock3D",
18
+ "CrossAttnDownBlock3D",
19
+ "DownBlock3D"
20
+ ],
21
+ "downsample_padding": 1,
22
+ "dual_cross_attention": false,
23
+ "flip_sin_to_cos": true,
24
+ "freq_shift": 0,
25
+ "in_channels": 4,
26
+ "layers_per_block": 2,
27
+ "mid_block_scale_factor": 1,
28
+ "mid_block_type": "UNetMidBlock3DCrossAttn",
29
+ "norm_eps": 1e-05,
30
+ "norm_num_groups": 32,
31
+ "num_class_embeds": null,
32
+ "only_cross_attention": false,
33
+ "out_channels": 4,
34
+ "resnet_time_scale_shift": "default",
35
+ "sample_size": 64,
36
+ "up_block_types": [
37
+ "UpBlock3D",
38
+ "CrossAttnUpBlock3D",
39
+ "CrossAttnUpBlock3D",
40
+ "CrossAttnUpBlock3D"
41
+ ],
42
+ "upcast_attention": false,
43
+ "use_linear_projection": false
44
+ }
Tune-A-Video/outputs/10_classes_500_epoch/unet/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af2f1223e8c8a3a0a64cda06b624dbad365452931252e0ff7b2717670d9afb76
3
+ size 371195904
Tune-A-Video/outputs/10_classes_500_epoch/vae/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.11.1",
4
+ "_name_or_path": "/userhome/zhoutianyi/Zhoutianyi/Mutilmodel/huggingface/stable-diffusion-v1-4",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 512,
24
+ "scaling_factor": 0.18215,
25
+ "up_block_types": [
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D",
29
+ "UpDecoderBlock2D"
30
+ ]
31
+ }
Tune-A-Video/outputs/10_classes_500_epoch/vae/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:695a1a027d19dfd499c075f2addfd728cdd111cc57e5aa544a99b3a5a445d068
3
+ size 167408066
Tune-A-Video/outputs/20_classes_500_epoch/config.yaml ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pretrained_model_path: /userhome/zhoutianyi/Zhoutianyi/Mutilmodel/huggingface/stable-diffusion-v1-4
2
+ output_dir: ./outputs/20_classes_500_epoch/
3
+ train_data:
4
+ video_path: /userhome/zhoutianyi/Zhoutianyi/dataset/SEED-DV/output/block1/163.mp4
5
+ prompt: a turtle swimming in the water
6
+ n_sample_frames: 6
7
+ width: 512
8
+ height: 288
9
+ sample_start_idx: 0
10
+ sample_frame_rate: 2
11
+ validation_data:
12
+ prompts:
13
+ - a cat is sleeping on the sofa
14
+ - an airplane in the sky
15
+ - a person is dancing
16
+ video_length: 6
17
+ width: 512
18
+ height: 288
19
+ num_inference_steps: 50
20
+ guidance_scale: 12.5
21
+ use_inv_latent: false
22
+ num_inv_steps: 50
23
+ validation_steps: 100
24
+ trainable_modules:
25
+ - attn1.to_q
26
+ - attn2.to_q
27
+ - attn_temp
28
+ train_batch_size: 10
29
+ max_train_steps: 500
30
+ learning_rate: 3.0e-05
31
+ scale_lr: false
32
+ lr_scheduler: constant
33
+ lr_warmup_steps: 0
34
+ adam_beta1: 0.9
35
+ adam_beta2: 0.999
36
+ adam_weight_decay: 0.01
37
+ adam_epsilon: 1.0e-08
38
+ max_grad_norm: 1.0
39
+ gradient_accumulation_steps: 1
40
+ gradient_checkpointing: true
41
+ checkpointing_steps: 1000
42
+ resume_from_checkpoint: null
43
+ mixed_precision: fp16
44
+ use_8bit_adam: false
45
+ enable_xformers_memory_efficient_attention: true
46
+ seed: 33
Tune-A-Video/outputs/20_classes_500_epoch/model_index.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "TuneAVideoPipeline",
3
+ "_diffusers_version": "0.11.1",
4
+ "scheduler": [
5
+ "diffusers",
6
+ "PNDMScheduler"
7
+ ],
8
+ "text_encoder": [
9
+ "transformers",
10
+ "CLIPTextModel"
11
+ ],
12
+ "tokenizer": [
13
+ "transformers",
14
+ "CLIPTokenizer"
15
+ ],
16
+ "unet": [
17
+ null,
18
+ "UNet3DConditionModel"
19
+ ],
20
+ "vae": [
21
+ "diffusers",
22
+ "AutoencoderKL"
23
+ ]
24
+ }
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100.gif ADDED

Git LFS Details

  • SHA256: 6c99eb9122814faf6fd53031e1d26202dfdd69ae2f007a28dc64dab88d765ca0
  • Pointer size: 132 Bytes
  • Size of remote file: 1.27 MB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: acce033720f2b1176e768655054ffb987592bda85991eb85570bdb88ed3c6fe0
  • Pointer size: 131 Bytes
  • Size of remote file: 546 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: 431c5707c173a8bf34835277c91bccc0fcac20f099d0874af5ec123ff19381f7
  • Pointer size: 131 Bytes
  • Size of remote file: 701 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-100/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: d08e8375f1a265e676ed7fc03b288cd3d0b28e1038935dcd8e230c3b34110735
  • Pointer size: 131 Bytes
  • Size of remote file: 302 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200.gif ADDED

Git LFS Details

  • SHA256: 39f885e02d80fa15377139815dee80098d27de4f77201b5f26471aa8a9c60e5f
  • Pointer size: 132 Bytes
  • Size of remote file: 1.14 MB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: 81b8cc5d9fc1e580807cb6aab290cf63058e0542234d05e0b19384b14f3a0e78
  • Pointer size: 131 Bytes
  • Size of remote file: 541 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: d9ea398fd9295d14dca8e73f5b5eee2f5a51d1a73397f088a0e487e5934f052f
  • Pointer size: 131 Bytes
  • Size of remote file: 558 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-200/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: e1453f150f3734cde04c9004232c1179c67ee616d46e2be0c3f520b6dab9b87e
  • Pointer size: 131 Bytes
  • Size of remote file: 262 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300.gif ADDED

Git LFS Details

  • SHA256: e9ea95b7b2d5cb1e66011a1561a26197ad5dd4fc07c977786170728327b4badf
  • Pointer size: 132 Bytes
  • Size of remote file: 1.17 MB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: 445ba12960fc25bf5ae898bd882ec2657f8765ddf91220cf8586e6d0d881966f
  • Pointer size: 131 Bytes
  • Size of remote file: 531 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/a person is dancing.gif ADDED

Git LFS Details

  • SHA256: 48381b289712d8f03ab48b4203b3e190f5564b88637205a4b8e68afdab5720fa
  • Pointer size: 131 Bytes
  • Size of remote file: 554 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-300/an airplane in the sky.gif ADDED

Git LFS Details

  • SHA256: bf255b12a6905005107dfe57f2607e44228d42cc08ccdafae882336ebebf5744
  • Pointer size: 131 Bytes
  • Size of remote file: 285 kB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400.gif ADDED

Git LFS Details

  • SHA256: 0fc684dcfa6e75d77e589cd0f72a15f68c6e5aa3230de4bf267048c6d3ad9b4d
  • Pointer size: 132 Bytes
  • Size of remote file: 1.12 MB
Tune-A-Video/outputs/20_classes_500_epoch/samples/sample-400/a cat is sleeping on the sofa.gif ADDED

Git LFS Details

  • SHA256: 3c2bfc67182201998da9cbb8da41c40c39ceaf2d3e9fa0994e896117d2c7ca0a
  • Pointer size: 131 Bytes
  • Size of remote file: 472 kB