Aloukik21 commited on
Commit
a40a75c
·
verified ·
1 Parent(s): 3924df4

Cleanup: remove 72 unneeded files (255GB) - duplicates, old models, DiffRhythm, Infinity

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. TTS/DiffRhythm/MuQ-MuLan-large/README.md +0 -111
  2. TTS/DiffRhythm/MuQ-MuLan-large/config.json +0 -41
  3. TTS/DiffRhythm/MuQ-MuLan-large/pytorch_model.bin +0 -3
  4. TTS/DiffRhythm/MuQ-large-msd-iter/README.md +0 -113
  5. TTS/DiffRhythm/MuQ-large-msd-iter/config.json +0 -143
  6. TTS/DiffRhythm/MuQ-large-msd-iter/model.safetensors +0 -3
  7. TTS/DiffRhythm/MuQ-large-msd-iter/pytorch_model.bin +0 -3
  8. TTS/DiffRhythm/cfm_model_v1_2.pt +0 -3
  9. TTS/DiffRhythm/config.json +0 -13
  10. TTS/DiffRhythm/vae_model.pt +0 -3
  11. TTS/DiffRhythm/xlm-roberta-base/README.md +0 -200
  12. TTS/DiffRhythm/xlm-roberta-base/config.json +0 -25
  13. TTS/DiffRhythm/xlm-roberta-base/flax_model.msgpack +0 -3
  14. TTS/DiffRhythm/xlm-roberta-base/model.onnx +0 -3
  15. TTS/DiffRhythm/xlm-roberta-base/model.safetensors +0 -3
  16. TTS/DiffRhythm/xlm-roberta-base/pytorch_model.bin +0 -3
  17. TTS/DiffRhythm/xlm-roberta-base/sentencepiece.bpe.model +0 -3
  18. TTS/DiffRhythm/xlm-roberta-base/tf_model.h5 +0 -3
  19. TTS/DiffRhythm/xlm-roberta-base/tokenizer.json +0 -0
  20. TTS/DiffRhythm/xlm-roberta-base/tokenizer_config.json +0 -1
  21. ace_step/README.md +0 -122
  22. ace_step/config.json +0 -35
  23. audio/MelBandRoformer_fp16.safetensors +0 -3
  24. diffusion_models/Phantom-Wan-14B_fp8_e4m3fn.safetensors +0 -3
  25. diffusion_models/Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors +0 -3
  26. diffusion_models/Wan2_1-InfiniteTalk-Multi_fp8_e4m3fn_scaled_KJ.safetensors +0 -3
  27. diffusion_models/Wan2_1-InfiniteTalk-Single_fp8_e4m3fn_scaled_KJ.safetensors +0 -3
  28. diffusion_models/Wan2_1-InfiniteTalk_Multi_Q8.gguf +0 -3
  29. diffusion_models/Wan2_1-InfiniteTalk_Single_Q8.gguf +0 -3
  30. diffusion_models/wan2.1-i2v-14b-480p-Q4_K_M.gguf +0 -3
  31. loras/FastWan_T2V_14B_480p_lora_rank_128_bf16.safetensors +0 -3
  32. loras/Wan2.2-Fun-A14B-InP-LOW-HPS2.1_resized_dynamic_avg_rank_15_bf16.safetensors +0 -3
  33. loras/Wan21_PusaV1_LoRA_14B_rank512_bf16.safetensors +0 -3
  34. misc/TTS/ACE-Step-v1-3.5B/ace_step_transformer/diffusion_pytorch_model.safetensors +0 -3
  35. misc/TTS/ACE-Step-v1-3.5B/music_dcae_f8c8/diffusion_pytorch_model.safetensors +0 -3
  36. misc/TTS/ACE-Step-v1-3.5B/music_vocoder/diffusion_pytorch_model.safetensors +0 -3
  37. misc/TTS/ACE-Step-v1-3.5B/umt5-base/model.safetensors +0 -3
  38. misc/ace_step/all_in_one/ace_step_v1_3.5b.safetensors +0 -3
  39. misc/clip_vision/clip_vision_h.safetensors +0 -3
  40. misc/diffusion_models/MelBandRoformer_fp16.safetensors +0 -3
  41. misc/diffusion_models/Wan14BI2VFusioniX_phantom_14B_fp16.safetensors +0 -3
  42. misc/diffusion_models/Wan2_1-Fun-V1_1-14B-Control-Camera_fp8_e4m3fn.safetensors +0 -3
  43. misc/diffusion_models/Wan2_1-InfiniteTalk_Multi_Q8.gguf +0 -3
  44. misc/diffusion_models/Wan2_1-InfiniteTalk_Single_Q8.gguf +0 -3
  45. misc/diffusion_models/Wan2_1-T2V-14B_fp8_e4m3fn_scaled_KJ.safetensors +0 -3
  46. misc/diffusion_models/Wan2_2-I2V-A14B-HIGH_fp8_e4m3fn_scaled_KJ.safetensors +0 -3
  47. misc/diffusion_models/Wan2_2-I2V-A14B-LOW_fp8_e4m3fn_scaled_KJ.safetensors +0 -3
  48. misc/diffusion_models/wan2.1-i2v-14b-480p-Q4_K_M.gguf +0 -3
  49. misc/diffusion_models/wan2.2_fun_camera_high_noise_14B_fp8_scaled.safetensors +0 -3
  50. misc/diffusion_models/wan2.2_fun_camera_low_noise_14B_fp8_scaled.safetensors +0 -3
TTS/DiffRhythm/MuQ-MuLan-large/README.md DELETED
@@ -1,111 +0,0 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- language:
4
- - en
5
- - zh
6
- pipeline_tag: audio-classification
7
- tags:
8
- - music
9
- ---
10
-
11
- # MuQ & MuQ-MuLan
12
-
13
- <div>
14
- <a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a>
15
- <a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a>
16
- <a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a>
17
- <a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a>
18
- <a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a>
19
- </div>
20
-
21
-
22
- This is the official repository for the paper *"**MuQ**: Self-Supervised **Mu**sic Representation Learning
23
- with Mel Residual Vector **Q**uantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)).
24
-
25
- In this repo, the following models are released:
26
-
27
- - **MuQ**(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
28
- - **MuQ-MuLan**(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.
29
-
30
-
31
- ## Usage
32
-
33
- To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`:
34
- ```bash
35
- pip3 install muq
36
- ```
37
-
38
-
39
- Using **MuQ-MuLan** to extract the music and text embeddings and calculate the similarity:
40
- ```python
41
- import torch, librosa
42
- from muq import MuQMuLan
43
-
44
- # This will automatically fetch checkpoints from huggingface
45
- device = 'cuda'
46
- mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
47
- mulan = mulan.to(device).eval()
48
-
49
- # Extract music embeddings
50
- wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
51
- wavs = torch.tensor(wav).unsqueeze(0).to(device)
52
- with torch.no_grad():
53
- audio_embeds = mulan(wavs = wavs)
54
-
55
- # Extract text embeddings (texts can be in English or Chinese)
56
- texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"]
57
- with torch.no_grad():
58
- text_embeds = mulan(texts = texts)
59
-
60
- # Calculate dot product similarity
61
- sim = mulan.calc_similarity(audio_embeds, text_embeds)
62
- print(sim)
63
- ```
64
-
65
-
66
- To extract music audio features using **MuQ**:
67
- ```python
68
- import torch, librosa
69
- from muq import MuQ
70
-
71
- device = 'cuda'
72
- wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
73
- wavs = torch.tensor(wav).unsqueeze(0).to(device)
74
-
75
- # This will automatically fetch the checkpoint from huggingface
76
- muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
77
- muq = muq.to(device).eval()
78
-
79
- with torch.no_grad():
80
- output = muq(wavs, output_hidden_states=True)
81
-
82
- print('Total number of layers: ', len(output.hidden_states))
83
- print('Feature shape: ', output.last_hidden_state.shape)
84
-
85
- ```
86
-
87
- ## Model Checkpoints
88
-
89
- | Model Name | Parameters | Data | HuggingFace🤗 |
90
- | ----------- | --- | --- | ----------- |
91
- | MuQ | ~300M | MSD dataset | [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) |
92
- | MuQ-MuLan | ~700M | music-text pairs | [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) |
93
-
94
- **Note**: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.
95
-
96
- ## License
97
-
98
- The code is released under the MIT license.
99
-
100
- The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.
101
-
102
- ## Citation
103
-
104
- ```
105
- @article{zhu2025muq,
106
- title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization},
107
- author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
108
- journal={arXiv preprint arXiv:2501.01108},
109
- year={2025}
110
- }
111
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/MuQ-MuLan-large/config.json DELETED
@@ -1,41 +0,0 @@
1
- {
2
- "mulan": {
3
- "sr": 24000,
4
- "clip_secs": 10,
5
- "dim_latent": 512,
6
- "decoupled_contrastive_learning": true,
7
- "hierarchical_contrastive_loss": false,
8
- "hierarchical_contrastive_loss_layers": null,
9
- "sigmoid_contrastive_loss": false,
10
- "rank_contrast": true
11
- },
12
- "audio_model": {
13
- "name": "OpenMuQ/MuQ-large-msd-iter",
14
- "model_dim": 1024,
15
- "use_layer_idx": -1
16
- },
17
- "text_model": {
18
- "name": "xlm-roberta-base",
19
- "model_dim": null,
20
- "use_layer_idx": -1
21
- },
22
- "audio_transformer": {
23
- "dim": 768,
24
- "tf_depth": 0,
25
- "heads": 8,
26
- "dim_head": 64,
27
- "attn_dropout": 0,
28
- "ff_dropout": 0,
29
- "ff_mult": 4
30
- },
31
- "text_transformer": {
32
- "dim": 768,
33
- "tf_depth": 8,
34
- "max_seq_len": 1024,
35
- "dim_head": 64,
36
- "heads": 8,
37
- "attn_dropout": 0,
38
- "ff_dropout": 0,
39
- "ff_mult": 4
40
- }
41
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/MuQ-MuLan-large/pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d42ae3f7cb9b66759ee0089ddc70e2f28b130c2d8ba621457358272d32dd0444
3
- size 2653954401
 
 
 
 
TTS/DiffRhythm/MuQ-large-msd-iter/README.md DELETED
@@ -1,113 +0,0 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- language:
4
- - en
5
- - zh
6
- pipeline_tag: audio-classification
7
- tags:
8
- - music
9
- ---
10
-
11
- # MuQ & MuQ-MuLan
12
-
13
- <div>
14
- <a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a>
15
- <a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a>
16
- <a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a>
17
- <a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a>
18
- <a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a>
19
- </div>
20
-
21
-
22
- This is the official repository for the paper *"**MuQ**: Self-Supervised **Mu**sic Representation Learning
23
- with Mel Residual Vector **Q**uantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)).
24
-
25
- In this repo, the following models are released:
26
-
27
- - **MuQ**(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
28
- - **MuQ-MuLan**(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.
29
-
30
-
31
- ## Usage
32
-
33
- To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`:
34
- ```bash
35
- pip3 install muq
36
- ```
37
-
38
-
39
-
40
- To extract music audio features using **MuQ**:
41
- ```python
42
- import torch, librosa
43
- from muq import MuQ
44
-
45
- device = 'cuda'
46
- wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
47
- wavs = torch.tensor(wav).unsqueeze(0).to(device)
48
-
49
- # This will automatically fetch the checkpoint from huggingface
50
- muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
51
- muq = muq.to(device).eval()
52
-
53
- with torch.no_grad():
54
- output = muq(wavs, output_hidden_states=True)
55
-
56
- print('Total number of layers: ', len(output.hidden_states))
57
- print('Feature shape: ', output.last_hidden_state.shape)
58
-
59
- ```
60
-
61
-
62
-
63
- Using **MuQ-MuLan** to extract the music and text embeddings and calculate the similarity:
64
- ```python
65
- import torch, librosa
66
- from muq import MuQMuLan
67
-
68
- # This will automatically fetch checkpoints from huggingface
69
- device = 'cuda'
70
- mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
71
- mulan = mulan.to(device).eval()
72
-
73
- # Extract music embeddings
74
- wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
75
- wavs = torch.tensor(wav).unsqueeze(0).to(device)
76
- with torch.no_grad():
77
- audio_embeds = mulan(wavs = wavs)
78
-
79
- # Extract text embeddings (texts can be in English or Chinese)
80
- texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"]
81
- with torch.no_grad():
82
- text_embeds = mulan(texts = texts)
83
-
84
- # Calculate dot product similarity
85
- sim = mulan.calc_similarity(audio_embeds, text_embeds)
86
- print(sim)
87
- ```
88
-
89
- ## Model Checkpoints
90
-
91
- | Model Name | Parameters | Data | HuggingFace🤗 |
92
- | ----------- | --- | --- | ----------- |
93
- | MuQ | ~300M | MSD dataset | [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) |
94
- | MuQ-MuLan | ~700M | music-text pairs | [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) |
95
-
96
- **Note**: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.
97
-
98
- ## License
99
-
100
- The code is released under the MIT license.
101
-
102
- The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.
103
-
104
- ## Citation
105
-
106
- ```
107
- @article{zhu2025muq,
108
- title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization},
109
- author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
110
- journal={arXiv preprint arXiv:2501.01108},
111
- year={2025}
112
- }
113
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/MuQ-large-msd-iter/config.json DELETED
@@ -1,143 +0,0 @@
1
- {
2
- "codebook_dim": 16,
3
- "codebook_size": 8192,
4
- "conv_dim": 512,
5
- "encoder_depth": 12,
6
- "encoder_dim": 1024,
7
- "features": [
8
- "melspec_2048"
9
- ],
10
- "hop_length": 240,
11
- "is_flash": false,
12
- "label_rate": 25,
13
- "mask_hop": 0.4,
14
- "mask_prob": 0.6,
15
- "n_mels": 128,
16
- "num_codebooks": 1,
17
- "recon_loss_ratio": null,
18
- "resume_checkpoint": null,
19
- "rvq_ckpt_path": null,
20
- "rvq_multi_layer_num": 1,
21
- "rvq_n_codebooks": 8,
22
- "stat": {
23
- "melspec_2048_cnt": 14282760192,
24
- "melspec_2048_mean": 6.768444971712967,
25
- "melspec_2048_std": 18.417922652295623
26
- },
27
- "use_encodec_target": false,
28
- "use_rvq_target": true,
29
- "use_vq_target": false,
30
- "w2v2_config": {
31
- "activation_dropout": 0.1,
32
- "adapter_kernel_size": 3,
33
- "adapter_stride": 2,
34
- "add_adapter": false,
35
- "apply_spec_augment": true,
36
- "architectures": [
37
- "Wav2Vec2ConformerForCTC"
38
- ],
39
- "attention_dropout": 0.1,
40
- "bos_token_id": 1,
41
- "classifier_proj_size": 256,
42
- "codevector_dim": 768,
43
- "conformer_conv_dropout": 0.1,
44
- "contrastive_logits_temperature": 0.1,
45
- "conv_bias": true,
46
- "conv_depthwise_kernel_size": 31,
47
- "conv_dim": [
48
- 512,
49
- 512,
50
- 512,
51
- 512,
52
- 512,
53
- 512,
54
- 512
55
- ],
56
- "conv_kernel": [
57
- 10,
58
- 3,
59
- 3,
60
- 3,
61
- 3,
62
- 2,
63
- 2
64
- ],
65
- "conv_stride": [
66
- 5,
67
- 2,
68
- 2,
69
- 2,
70
- 2,
71
- 2,
72
- 2
73
- ],
74
- "ctc_loss_reduction": "sum",
75
- "ctc_zero_infinity": false,
76
- "diversity_loss_weight": 0.1,
77
- "do_stable_layer_norm": true,
78
- "eos_token_id": 2,
79
- "feat_extract_activation": "gelu",
80
- "feat_extract_dropout": 0.0,
81
- "feat_extract_norm": "layer",
82
- "feat_proj_dropout": 0.1,
83
- "feat_quantizer_dropout": 0.0,
84
- "final_dropout": 0.1,
85
- "gradient_checkpointing": false,
86
- "hidden_act": "swish",
87
- "hidden_dropout": 0.1,
88
- "hidden_dropout_prob": 0.1,
89
- "hidden_size": 1024,
90
- "initializer_range": 0.02,
91
- "intermediate_size": 4096,
92
- "layer_norm_eps": 1e-05,
93
- "layerdrop": 0.0,
94
- "mask_feature_length": 10,
95
- "mask_feature_min_masks": 0,
96
- "mask_feature_prob": 0.0,
97
- "mask_time_length": 10,
98
- "mask_time_min_masks": 2,
99
- "mask_time_prob": 0.05,
100
- "max_source_positions": 5000,
101
- "model_type": "wav2vec2-conformer",
102
- "num_adapter_layers": 3,
103
- "num_attention_heads": 16,
104
- "num_codevector_groups": 2,
105
- "num_codevectors_per_group": 320,
106
- "num_conv_pos_embedding_groups": 16,
107
- "num_conv_pos_embeddings": 128,
108
- "num_feat_extract_layers": 7,
109
- "num_hidden_layers": 24,
110
- "num_negatives": 100,
111
- "output_hidden_size": 1024,
112
- "pad_token_id": 0,
113
- "position_embeddings_type": "rotary",
114
- "proj_codevector_dim": 768,
115
- "rotary_embedding_base": 10000,
116
- "tdnn_dilation": [
117
- 1,
118
- 2,
119
- 3,
120
- 1,
121
- 1
122
- ],
123
- "tdnn_dim": [
124
- 512,
125
- 512,
126
- 512,
127
- 512,
128
- 1500
129
- ],
130
- "tdnn_kernel": [
131
- 5,
132
- 3,
133
- 3,
134
- 1,
135
- 1
136
- ],
137
- "torch_dtype": "float32",
138
- "transformers_version": "4.19.0.dev0",
139
- "use_weighted_layer_sum": false,
140
- "vocab_size": 32,
141
- "xvector_output_dim": 512
142
- }
143
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/MuQ-large-msd-iter/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:273febab2be02872c37d2c37e48a9d6c52c1c9392f3eeeabd498efa281ccb7a6
3
- size 1333825096
 
 
 
 
TTS/DiffRhythm/MuQ-large-msd-iter/pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:334df3de2832ec1acfd8b6ce54e7de4073401fe821f7ec0ad0d954832be2d26a
3
- size 1333965438
 
 
 
 
TTS/DiffRhythm/cfm_model_v1_2.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3e819b317ce2cf1fb22f386d74f351b697204ec1f57f03edfe50dbca71cf0768
3
- size 2218709125
 
 
 
 
TTS/DiffRhythm/config.json DELETED
@@ -1,13 +0,0 @@
1
- {
2
- "model_type": "diffrhythm",
3
- "model": {
4
- "dim": 2048,
5
- "depth": 16,
6
- "heads": 32,
7
- "ff_mult": 4,
8
- "text_dim": 512,
9
- "conv_layers": 4,
10
- "mel_dim": 64,
11
- "text_num_embeds": 363
12
- }
13
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/vae_model.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:712693f27299937c6ccf1a6d6f1d9b45c7c8c11210d3b0cbb0f36181465ba29f
3
- size 624520127
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/README.md DELETED
@@ -1,200 +0,0 @@
1
- ---
2
- tags:
3
- - exbert
4
- language:
5
- - multilingual
6
- - af
7
- - am
8
- - ar
9
- - as
10
- - az
11
- - be
12
- - bg
13
- - bn
14
- - br
15
- - bs
16
- - ca
17
- - cs
18
- - cy
19
- - da
20
- - de
21
- - el
22
- - en
23
- - eo
24
- - es
25
- - et
26
- - eu
27
- - fa
28
- - fi
29
- - fr
30
- - fy
31
- - ga
32
- - gd
33
- - gl
34
- - gu
35
- - ha
36
- - he
37
- - hi
38
- - hr
39
- - hu
40
- - hy
41
- - id
42
- - is
43
- - it
44
- - ja
45
- - jv
46
- - ka
47
- - kk
48
- - km
49
- - kn
50
- - ko
51
- - ku
52
- - ky
53
- - la
54
- - lo
55
- - lt
56
- - lv
57
- - mg
58
- - mk
59
- - ml
60
- - mn
61
- - mr
62
- - ms
63
- - my
64
- - ne
65
- - nl
66
- - no
67
- - om
68
- - or
69
- - pa
70
- - pl
71
- - ps
72
- - pt
73
- - ro
74
- - ru
75
- - sa
76
- - sd
77
- - si
78
- - sk
79
- - sl
80
- - so
81
- - sq
82
- - sr
83
- - su
84
- - sv
85
- - sw
86
- - ta
87
- - te
88
- - th
89
- - tl
90
- - tr
91
- - ug
92
- - uk
93
- - ur
94
- - uz
95
- - vi
96
- - xh
97
- - yi
98
- - zh
99
- license: mit
100
- ---
101
-
102
- # XLM-RoBERTa (base-sized model)
103
-
104
- XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Conneau et al. and first released in [this repository](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
105
-
106
- Disclaimer: The team releasing XLM-RoBERTa did not write a model card for this model so this model card has been written by the Hugging Face team.
107
-
108
- ## Model description
109
-
110
- XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages.
111
-
112
- RoBERTa is a transformers model pretrained on a large corpus in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.
113
-
114
- More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence.
115
-
116
- This way, the model learns an inner representation of 100 languages that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the XLM-RoBERTa model as inputs.
117
-
118
- ## Intended uses & limitations
119
-
120
- You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?search=xlm-roberta) to look for fine-tuned versions on a task that interests you.
121
-
122
- Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2.
123
-
124
- ## Usage
125
-
126
- You can use this model directly with a pipeline for masked language modeling:
127
-
128
- ```python
129
- >>> from transformers import pipeline
130
- >>> unmasker = pipeline('fill-mask', model='xlm-roberta-base')
131
- >>> unmasker("Hello I'm a <mask> model.")
132
-
133
- [{'score': 0.10563907772302628,
134
- 'sequence': "Hello I'm a fashion model.",
135
- 'token': 54543,
136
- 'token_str': 'fashion'},
137
- {'score': 0.08015287667512894,
138
- 'sequence': "Hello I'm a new model.",
139
- 'token': 3525,
140
- 'token_str': 'new'},
141
- {'score': 0.033413201570510864,
142
- 'sequence': "Hello I'm a model model.",
143
- 'token': 3299,
144
- 'token_str': 'model'},
145
- {'score': 0.030217764899134636,
146
- 'sequence': "Hello I'm a French model.",
147
- 'token': 92265,
148
- 'token_str': 'French'},
149
- {'score': 0.026436051353812218,
150
- 'sequence': "Hello I'm a sexy model.",
151
- 'token': 17473,
152
- 'token_str': 'sexy'}]
153
- ```
154
-
155
- Here is how to use this model to get the features of a given text in PyTorch:
156
-
157
- ```python
158
- from transformers import AutoTokenizer, AutoModelForMaskedLM
159
-
160
- tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
161
- model = AutoModelForMaskedLM.from_pretrained("xlm-roberta-base")
162
-
163
- # prepare input
164
- text = "Replace me by any text you'd like."
165
- encoded_input = tokenizer(text, return_tensors='pt')
166
-
167
- # forward pass
168
- output = model(**encoded_input)
169
- ```
170
-
171
- ### BibTeX entry and citation info
172
-
173
- ```bibtex
174
- @article{DBLP:journals/corr/abs-1911-02116,
175
- author = {Alexis Conneau and
176
- Kartikay Khandelwal and
177
- Naman Goyal and
178
- Vishrav Chaudhary and
179
- Guillaume Wenzek and
180
- Francisco Guzm{\'{a}}n and
181
- Edouard Grave and
182
- Myle Ott and
183
- Luke Zettlemoyer and
184
- Veselin Stoyanov},
185
- title = {Unsupervised Cross-lingual Representation Learning at Scale},
186
- journal = {CoRR},
187
- volume = {abs/1911.02116},
188
- year = {2019},
189
- url = {http://arxiv.org/abs/1911.02116},
190
- eprinttype = {arXiv},
191
- eprint = {1911.02116},
192
- timestamp = {Mon, 11 Nov 2019 18:38:09 +0100},
193
- biburl = {https://dblp.org/rec/journals/corr/abs-1911-02116.bib},
194
- bibsource = {dblp computer science bibliography, https://dblp.org}
195
- }
196
- ```
197
-
198
- <a href="https://huggingface.co/exbert/?model=xlm-roberta-base">
199
- <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
200
- </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/config.json DELETED
@@ -1,25 +0,0 @@
1
- {
2
- "architectures": [
3
- "XLMRobertaForMaskedLM"
4
- ],
5
- "attention_probs_dropout_prob": 0.1,
6
- "bos_token_id": 0,
7
- "eos_token_id": 2,
8
- "hidden_act": "gelu",
9
- "hidden_dropout_prob": 0.1,
10
- "hidden_size": 768,
11
- "initializer_range": 0.02,
12
- "intermediate_size": 3072,
13
- "layer_norm_eps": 1e-05,
14
- "max_position_embeddings": 514,
15
- "model_type": "xlm-roberta",
16
- "num_attention_heads": 12,
17
- "num_hidden_layers": 12,
18
- "output_past": true,
19
- "pad_token_id": 1,
20
- "position_embedding_type": "absolute",
21
- "transformers_version": "4.17.0.dev0",
22
- "type_vocab_size": 1,
23
- "use_cache": true,
24
- "vocab_size": 250002
25
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/flax_model.msgpack DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:311b6941e02128b01c6a429f55b47b351a86fe53e6802774d87696bcbc465992
3
- size 1113187999
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/model.onnx DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a76bfe6a405f1a9ace42b2dbd8fbd284dd8127a732ddcf2145b0fc9413b30d40
3
- size 1881470773
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6fd4797bc397c3b8b55d6bb5740366b57e6a3ce91c04c77f22aafc0c128e6feb
3
- size 1115567652
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d83baaafea92d36de26002c8135a427d55ee6fdc4faaa6e400be4c47724a07e
3
- size 1115590446
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/sentencepiece.bpe.model DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
- size 5069051
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/tf_model.h5 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d1232fb4018ab3a236c29f10aefd190ef844ad994ac74820d9532637bd87b3f4
3
- size 1112441536
 
 
 
 
TTS/DiffRhythm/xlm-roberta-base/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
TTS/DiffRhythm/xlm-roberta-base/tokenizer_config.json DELETED
@@ -1 +0,0 @@
1
- {"model_max_length": 512}
 
 
ace_step/README.md DELETED
@@ -1,122 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - music
5
- - text2music
6
- - acestep
7
- pipeline_tag: text-to-audio
8
- language:
9
- - en
10
- - zh
11
- - de
12
- - fr
13
- - es
14
- - it
15
- - pt
16
- - pl
17
- - tr
18
- - ru
19
- - cs
20
- - nl
21
- - ar
22
- - ja
23
- - hu
24
- - ko
25
- - hi
26
- ---
27
-
28
- # ACE-Step: A Step Towards Music Generation Foundation Model
29
-
30
- ![ACE-Step Framework](https://github.com/ACE-Step/ACE-Step/raw/main/assets/ACE-Step_framework.png)
31
-
32
- ## Model Description
33
-
34
- ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.
35
-
36
- **Key Features:**
37
- - 15× faster than LLM-based baselines (20s for 4-minute music on A100)
38
- - Superior musical coherence across melody, harmony, and rhythm
39
- - full-song generation, duration control and accepts natural language descriptions
40
-
41
- ## Uses
42
-
43
- ### Direct Use
44
- ACE-Step can be used for:
45
- - Generating original music from text descriptions
46
- - Music remixing and style transfer
47
- - edit song lyrics
48
-
49
- ### Downstream Use
50
- The model serves as a foundation for:
51
- - Voice cloning applications
52
- - Specialized music generation (rap, jazz, etc.)
53
- - Music production tools
54
- - Creative AI assistants
55
-
56
- ### Out-of-Scope Use
57
- The model should not be used for:
58
- - Generating copyrighted content without permission
59
- - Creating harmful or offensive content
60
- - Misrepresenting AI-generated music as human-created
61
-
62
- ## How to Get Started
63
-
64
- see: https://github.com/ace-step/ACE-Step
65
-
66
- ## Hardware Performance
67
-
68
- | Device | 27 Steps | 60 Steps |
69
- |---------------|----------|----------|
70
- | NVIDIA A100 | 27.27x | 12.27x |
71
- | RTX 4090 | 34.48x | 15.63x |
72
- | RTX 3090 | 12.76x | 6.48x |
73
- | M2 Max | 2.27x | 1.03x |
74
-
75
- *RTF (Real-Time Factor) shown - higher values indicate faster generation*
76
-
77
-
78
- ## Limitations
79
-
80
- - Performance varies by language (top 10 languages perform best)
81
- - Longer generations (>5 minutes) may lose structural coherence
82
- - Rare instruments may not render perfectly
83
- - Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
84
- - Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
85
- - Continuity Artifacts: Unnatural transitions in repainting/extend operations
86
- - Vocal Quality: Coarse vocal synthesis lacking nuance
87
- - Control Granularity: Needs finer-grained musical parameter control
88
-
89
- ## Ethical Considerations
90
-
91
- Users should:
92
- - Verify originality of generated works
93
- - Disclose AI involvement
94
- - Respect cultural elements and copyrights
95
- - Avoid harmful content generation
96
-
97
-
98
- ## Model Details
99
-
100
- **Developed by:** ACE Studio and StepFun
101
- **Model type:** Diffusion-based music generation with transformer conditioning
102
- **License:** Apache 2.0
103
- **Resources:**
104
- - [Project Page](https://ace-step.github.io/)
105
- - [Demo Space](https://huggingface.co/spaces/ACE-Step/ACE-Step)
106
- - [GitHub Repository](https://github.com/ACE-Step/ACE-Step)
107
-
108
-
109
- ## Citation
110
-
111
- ```bibtex
112
- @misc{gong2025acestep,
113
- title={ACE-Step: A Step Towards Music Generation Foundation Model},
114
- author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
115
- howpublished={\url{https://github.com/ace-step/ACE-Step}},
116
- year={2025},
117
- note={GitHub repository}
118
- }
119
- ```
120
-
121
- ## Acknowledgements
122
- This project is co-led by ACE Studio and StepFun.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ace_step/config.json DELETED
@@ -1,35 +0,0 @@
1
- {
2
- "_class_name": "ACEStepTransformer2DModel",
3
- "_diffusers_version": "0.32.2",
4
- "attention_head_dim": 128,
5
- "in_channels": 8,
6
- "inner_dim": 2560,
7
- "lyric_encoder_vocab_size": 6693,
8
- "lyric_hidden_size": 1024,
9
- "max_height": 16,
10
- "max_position": 32768,
11
- "max_width": 32768,
12
- "mlp_ratio": 2.5,
13
- "num_attention_heads": 20,
14
- "num_layers": 24,
15
- "out_channels": 8,
16
- "patch_size": [
17
- 16,
18
- 1
19
- ],
20
- "rope_theta": 1000000.0,
21
- "speaker_embedding_dim": 512,
22
- "ssl_encoder_depths": [
23
- 8,
24
- 8
25
- ],
26
- "ssl_latent_dims": [
27
- 1024,
28
- 768
29
- ],
30
- "ssl_names": [
31
- "mert",
32
- "m-hubert"
33
- ],
34
- "text_embedding_dim": 768
35
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
audio/MelBandRoformer_fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6119aef379a6c7264e0b37db65ae1e6488b8ca4a00baf56d6d244737b8488226
3
- size 456479072
 
 
 
 
diffusion_models/Phantom-Wan-14B_fp8_e4m3fn.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:205c2924aadcd4e1312d6aac0b4cfba80eeea33db99419b113c10eec4810cabc
3
- size 15001320640
 
 
 
 
diffusion_models/Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2ff922282cd84589702e6e8c26e083d1160bfc2b217dd44e1ae2688441dc495d
3
- size 16643349018
 
 
 
 
diffusion_models/Wan2_1-InfiniteTalk-Multi_fp8_e4m3fn_scaled_KJ.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ded4f02f2bf312e7a68f2d75cd0c680a177aef6917c9960a1eddc34f70de26d
3
- size 2712729090
 
 
 
 
diffusion_models/Wan2_1-InfiniteTalk-Single_fp8_e4m3fn_scaled_KJ.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:bd6e0e6feab8c22a482b1c4dd7c0504c215c35b507ddc3b4dcaa5d3ef539879e
3
- size 2713548210
 
 
 
 
diffusion_models/Wan2_1-InfiniteTalk_Multi_Q8.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b9b1dc2fb0f0a351e688ad8dc7545bf90b2a2f20cd91953ac077510ef6b7bc0
3
- size 2646330016
 
 
 
 
diffusion_models/Wan2_1-InfiniteTalk_Single_Q8.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5e251c56174995d940494ec02fdf9d36da00dffdde6827829801cd171fe8ffd
3
- size 2646330016
 
 
 
 
diffusion_models/wan2.1-i2v-14b-480p-Q4_K_M.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82
3
- size 11341184384
 
 
 
 
loras/FastWan_T2V_14B_480p_lora_rank_128_bf16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:93fe4efb5198710843de9843091e15a4a967702f62f169135b73be51884fb7d7
3
- size 1253192432
 
 
 
 
loras/Wan2.2-Fun-A14B-InP-LOW-HPS2.1_resized_dynamic_avg_rank_15_bf16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1879ffd9ee08b533157eb04b6440673515be1ac7b4ee81648355e3bf3a59bdfd
3
- size 101752852
 
 
 
 
loras/Wan21_PusaV1_LoRA_14B_rank512_bf16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a510b5562e05efa831127bd6a6b3aecf1c4747cffdddcc0b28f88c0667ef1694
3
- size 4907437824
 
 
 
 
misc/TTS/ACE-Step-v1-3.5B/ace_step_transformer/diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e810f16728d8a2e0d1b9c3a907aac8c9a427ce38edbd890cb3dce5ff92da5aad
3
- size 6611422728
 
 
 
 
misc/TTS/ACE-Step-v1-3.5B/music_dcae_f8c8/diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b0cb469307ac50659d1880db2a99bae47d0df335cbb36853964662d4b80e8ee
3
- size 313646516
 
 
 
 
misc/TTS/ACE-Step-v1-3.5B/music_vocoder/diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c92c9b46e28ab7b37b777780cf4308ad7ddac869636bb77aa61599358c4bc1c0
3
- size 206350988
 
 
 
 
misc/TTS/ACE-Step-v1-3.5B/umt5-base/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:779cec0d210b2123e21d0a9cd8128f02b4d412627355028965a8be0b241cc3b6
3
- size 1127460248
 
 
 
 
misc/ace_step/all_in_one/ace_step_v1_3.5b.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f07cad74c4adce52ca14ca1bdf74cf3c14cbafb0823b95eca4459467fa369f40
3
- size 7699743341
 
 
 
 
misc/clip_vision/clip_vision_h.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:64a7ef761bfccbadbaa3da77366aac4185a6c58fa5de5f589b42a65bcc21f161
3
- size 1264219396
 
 
 
 
misc/diffusion_models/MelBandRoformer_fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6119aef379a6c7264e0b37db65ae1e6488b8ca4a00baf56d6d244737b8488226
3
- size 456479072
 
 
 
 
misc/diffusion_models/Wan14BI2VFusioniX_phantom_14B_fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:205c2924aadcd4e1312d6aac0b4cfba80eeea33db99419b113c10eec4810cabc
3
- size 15001320640
 
 
 
 
misc/diffusion_models/Wan2_1-Fun-V1_1-14B-Control-Camera_fp8_e4m3fn.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:44fb0cd28b22e5f3fe71ec9604e1e03c83cb6b15cf0353a7f2b77bc316fafcc7
3
- size 17648319713
 
 
 
 
misc/diffusion_models/Wan2_1-InfiniteTalk_Multi_Q8.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b9b1dc2fb0f0a351e688ad8dc7545bf90b2a2f20cd91953ac077510ef6b7bc0
3
- size 2646330016
 
 
 
 
misc/diffusion_models/Wan2_1-InfiniteTalk_Single_Q8.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5e251c56174995d940494ec02fdf9d36da00dffdde6827829801cd171fe8ffd
3
- size 2646330016
 
 
 
 
misc/diffusion_models/Wan2_1-T2V-14B_fp8_e4m3fn_scaled_KJ.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:5519e566e620037b1adb399886143991036d27d44455f41190410967a2fc130d
3
- size 14526876890
 
 
 
 
misc/diffusion_models/Wan2_2-I2V-A14B-HIGH_fp8_e4m3fn_scaled_KJ.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b3a6e732feb5fd5fa35f5e3ef612fa1f0a77dc66601fbf999d4f84a01e7120a6
3
- size 15002999858
 
 
 
 
misc/diffusion_models/Wan2_2-I2V-A14B-LOW_fp8_e4m3fn_scaled_KJ.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3338c9e672ad9e406a28b38231d6c9d94bf63ab73c3940b91428321993491bb8
3
- size 15002999858
 
 
 
 
misc/diffusion_models/wan2.1-i2v-14b-480p-Q4_K_M.gguf DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82
3
- size 11341184384
 
 
 
 
misc/diffusion_models/wan2.2_fun_camera_high_noise_14B_fp8_scaled.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c14fec6b1f1ee16acf7c6ae2feab8c2b0e909cfad15f6765d959c6dea587e0b4
3
- size 15535183490
 
 
 
 
misc/diffusion_models/wan2.2_fun_camera_low_noise_14B_fp8_scaled.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6251dee756a4b9b26862e63491706aa68cad55999efc8299c102b54785b5f944
3
- size 15535183490