|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- audio |
|
|
- speech |
|
|
- language-model |
|
|
- auristream |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# AuriStream - Speech Language Model |
|
|
|
|
|
**AuriStream** is a speech language model by **Greta Tuckute** and **Klemen Kotar**. |
|
|
|
|
|
This repository contains the shared model code for AuriStream models. |
|
|
|
|
|
## Overview |
|
|
|
|
|
AuriStream is a GPT-like transformer model for cochlear token prediction with optional |
|
|
multi-token prediction (MTP) heads. |
|
|
|
|
|
This model predicts cochlear tokens from a tokenizer such as [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192). |
|
|
|
|
|
## Usage |
|
|
|
|
|
This repository is not meant to be used directly. Instead, use one of the checkpoint |
|
|
repositories that reference this base code: |
|
|
|
|
|
- [AuriStream7B_40Pred_BigAudioDataset_500k](https://huggingface.co/TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k) |
|
|
|
|
|
To load a checkpoint: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel, AutoConfig |
|
|
|
|
|
model = AutoModel.from_pretrained( |
|
|
"TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k", |
|
|
trust_remote_code=True, |
|
|
) |
|
|
``` |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
The AuriStream model includes: |
|
|
- RMSNorm for layer normalization |
|
|
- Rotary Position Embeddings (RoPE) |
|
|
- SiLU activation in MLP layers |
|
|
- Multi-token prediction heads |
|
|
|
|
|
## Configuration Options |
|
|
|
|
|
| Parameter | Description | Default | |
|
|
|-----------|-------------|---------| |
|
|
| `vocab_size` | Number of cochlear tokens | 8192 | |
|
|
| `n_embd` | Hidden dimension | 768 | |
|
|
| `n_layer` | Number of transformer layers | 12 | |
|
|
| `n_head` | Number of attention heads | 12 | |
|
|
| `n_pred_steps` | Number of prediction steps (MTP) | 1 | |
|
|
|
|
|
## Files |
|
|
|
|
|
- `configuration_auristream.py` - Configuration class |
|
|
- `modeling_auristream.py` - Model implementation |
|
|
|
|
|
## Tokenizer |
|
|
|
|
|
This model uses cochlear tokens from [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192). |
|
|
|