opsam commited on
Commit
06a8a87
·
verified ·
1 Parent(s): 0a7fb48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: audio-to-audio
6
+ ---
7
+
8
+ # HCodec-1.5 with adaptive frame rate
9
+ ## Installation
10
+ 1. Install dependencies from requirement.txt via pypi
11
+
12
+ ## Quick start
13
+ + Generate tokens from audio
14
+ + Reconstruct audio from tokens
15
+
16
+ ```bash
17
+ #!/bin/bash
18
+ python audio_tokenizer.py
19
+ ```
20
+
21
+ ## Optional configuration
22
+ + Customize your testing options about adaptive frame rate
23
+
24
+ ```yaml
25
+ # hyperparameter configuration in conf/config_adaptive_v3.yaml
26
+
27
+ training: false # keep false when testing
28
+ use_similarity_alignment: true
29
+ use_dynamic_similarity_threshold: false
30
+ infer_using_dynamic_threshold: true # work when manual_threshold is null
31
+ similarity_threshold: 0.7
32
+ similarity_threshold_lower: 0.7
33
+ similarity_threshold_upper: 1.0 # valid interval of dynamic threshold when 'infer_using_dynamic_threshold' turns on
34
+ max_tokens_per_group: 8
35
+ manual_threshold: 0.6 # set to a fixed value when evaluate specific threshold
36
+ ```
37
+
38
+ ## 😘 Acknowlegement
39
+ We would like to thank the great work of following projects:
40
+
41
+ - The adaptive mechanism implementation is based on the work from [FlexiCodec](https://github.com/amphionspace/FlexiCodec) and [VARSTok](https://github.com/FunAudioLLM/FunResearch/tree/main/VARSTok).
42
+ - Transformer implementation is based on the work from [Mimi Codec](https://github.com/kyutai-labs/moshi)