primepake commited on
Commit
797c99c
·
1 Parent(s): 7215932

update readme

Browse files
Files changed (1) hide show
  1. README.md +14 -16
README.md CHANGED
@@ -29,27 +29,25 @@ Maps discrete tokens to a continuous latent space using a Variational Autoencode
29
 
30
  ### 1. Model Training
31
 
32
- #### DAC Codec
33
- - Based on the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)
34
- - Provides efficient audio tokenization
35
- - Utilizes CosyVoice2's optimized training pipeline
36
 
37
- #### DAC-VAE
38
- - Train using `train_dac_vae.py`
39
  - Learns continuous latent representations from discrete tokens
40
- - Architecture adapted from CosyVoice2's VAE implementation
41
 
42
  ### 2. Feature Extraction
43
 
44
  Before training the main model:
45
- 1. Extract discrete tokens using the trained DAC codec
46
- 2. Generate continuous latent representations using the trained DAC-VAE
47
 
48
  ### 3. Two-Stage Training
49
 
50
  Train the models sequentially:
51
- - **Stage 1**: Audio → Discrete token modeling
52
- - **Stage 2**: Discrete token → Continuous latent space modeling
53
 
54
  ## Getting Started
55
 
@@ -61,22 +59,22 @@ pip install -r requirements.txt
61
 
62
  ### Training Pipeline
63
 
64
- 1. **Train DAC Codec** (if not using pretrained)
65
  ```bash
66
  # Add training command
67
  ```
68
 
69
- 2. **Train DAC-VAE**
70
  ```bash
71
- python train_dac_vae.py --config configs/dac_vae.yaml
72
  ```
73
 
74
- 3. **Extract Features**
75
  ```bash
76
  # Add feature extraction commands
77
  ```
78
 
79
- 4. **Train MiniMax-Speech**
80
  ```bash
81
  # Add main training command
82
  ```
 
29
 
30
  ### 1. Model Training
31
 
32
+ #### BPE tokens to DAC codec tokens
33
+ - Based on the
34
+ - Using Auto Regressive to predict the DAC codec tokens with learnable speaker extractor
 
35
 
36
+ #### DAC codec tokens to DAC-VAE latent
37
+ - Based on Cosyvoice2 flow matching decoder
38
  - Learns continuous latent representations from discrete tokens
 
39
 
40
  ### 2. Feature Extraction
41
 
42
  Before training the main model:
43
+ 1. Extract discrete tokens using the trained DAC codec [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)
44
+ 2. Generate continuous latent representations using the trained DAC-VAE - the pretrained I provided here: [DAC-VAE](https://drive.google.com/file/d/1iwZhPlcdDwvPjeON3bFAeYarsV4ZtI2E/view?usp=sharing)
45
 
46
  ### 3. Two-Stage Training
47
 
48
  Train the models sequentially:
49
+ - **Stage 1**: BPE tokens → Discrete DAC codec
50
+ - **Stage 2**: Discrete DAC codec DAC-VAE Continuous latent space
51
 
52
  ## Getting Started
53
 
 
59
 
60
  ### Training Pipeline
61
 
62
+ 1. **Extracting DAC Codec** (if not using pretrained)
63
  ```bash
64
  # Add training command
65
  ```
66
 
67
+ 2. **Extracting DAC-VAE latent**
68
  ```bash
69
+ python inference.py
70
  ```
71
 
72
+ 3. **Stage 1: Auto Regressive Transformer**
73
  ```bash
74
  # Add feature extraction commands
75
  ```
76
 
77
+ 4. **Stage 2: FLow matching decoder**
78
  ```bash
79
  # Add main training command
80
  ```