Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

.gitattributes +0 -1
README.md +105 -0
coarse.pt +3 -0
coarse_2.pt +3 -0
fine.pt +3 -0
fine_2.pt +3 -0
text.pt +3 -0
text_2.pt +3 -0

.gitattributes CHANGED Viewed

@@ -25,7 +25,6 @@
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,105 @@

+---
+language:
+  - en
+  - de
+  - es
+  - fr
+  - hi
+  - it
+  - ja
+  - ko
+  - pl
+  - pt
+  - ru
+  - tr
+  - zh
+thumbnail: https://user-images.githubusercontent.com/5068315/230698495-cbb1ced9-c911-4c9a-941d-a1a4a1286ac6.png
+library: "bark"
+license: "cc-by-nc-4.0"
+tags:
+- bark
+- audio
+- text-to-speech
+---
+# Bark
+Bark is a transformer-based text-to-audio model created by [Suno](https://www.suno.ai).
+Bark can generate highly realistic, multilingual speech as well as other audio - including music,
+background noise and simple sound effects. The model can also produce nonverbal
+communications like laughing, sighing and crying. To support the research community,
+we are providing access to pretrained model checkpoints ready for inference.
+The original github repo and model card can be found [here](https://github.com/suno-ai/bark).
+This model is meant for research purposes only.
+The model output is not censored and the authors do not endorse the opinions in the generated content.
+Use at your own risk.
+The following is additional information about the models released here.
+## Model Usage
+```python
+from bark import SAMPLE_RATE, generate_audio, preload_models
+from IPython.display import Audio
+# download and load all models
+preload_models()
+# generate audio from text
+text_prompt = """
+     Hello, my name is Suno. And, uh — and I like pizza. [laughs]
+     But I also have other interests such as playing tic tac toe.
+"""
+audio_array = generate_audio(text_prompt)
+# play text in notebook
+Audio(audio_array, rate=SAMPLE_RATE)
+```
+[pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
+To save `audio_array` as a WAV file:
+```python
+from scipy.io.wavfile import write as write_wav
+write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array)
+```
+## Model Details
+Bark is a series of three transformer models that turn text into audio.
+### Text to semantic tokens
+ - Input: text, tokenized with [BERT tokenizer from Hugging Face](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)
+ - Output: semantic tokens that encode the audio to be generated
+### Semantic to coarse tokens
+ - Input: semantic tokens
+ - Output: tokens from the first two codebooks of the [EnCodec Codec](https://github.com/facebookresearch/encodec) from facebook
+### Coarse to fine tokens
+ - Input: the first two codebooks from EnCodec
+ - Output: 8 codebooks from EnCodec
+### Architecture
+|           Model           | Parameters | Attention  | Output Vocab size |
+|:-------------------------:|:----------:|------------|:-----------------:|
+|  Text to semantic tokens  |    80/300 M    | Causal     |       10,000      |
+| Semantic to coarse tokens |    80/300 M    | Causal     |     2x 1,024      |
+|   Coarse to fine tokens   |    80/300 M    | Non-causal |     6x 1,024      |
+### Release date
+April 2023
+## Broader Implications
+We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
+While we hope that this release will enable users to express their creativity and build applications that are a force
+for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
+to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
+we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).

coarse.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:110580140ce5319b5b26849e24378d7594eb75ad11e7203e3091a876a07e4536
+size 1251939909

coarse_2.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:286abc253d4d7f4d148325df07585f7ca4fca36ce40577a1ddd744a8b35e4388
+size 3934534533

fine.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ec1eb35cd3e21506b0c045ded225271d9a25d9fa608662585cfd749590a0eac
+size 1107111557

fine_2.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:799c87afab4b01537094c63ea231f2c42c9c07aeb16773690540ad251a6d8fab
+size 3741740229

text.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ecd798cf39a5ecbec30ef41a3d9d63fb61ea09b78d3bd5dfebb2f7343087b1be
+size 2315982725

text_2.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccdedd35373bc3a16845f1f1452c5c96926f5cbccab01e824f7f15add2c16a35
+size 5353258741