File size: 1,836 Bytes
9c3e596 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
license: apache-2.0
tags:
- audio
- speech
- language-model
- auristream
library_name: transformers
---
# AuriStream - Speech Language Model
**AuriStream** is a speech language model by **Greta Tuckute** and **Klemen Kotar**.
This repository contains the shared model code for AuriStream models.
## Overview
AuriStream is a GPT-like transformer model for cochlear token prediction with optional
multi-token prediction (MTP) heads.
This model predicts cochlear tokens from a tokenizer such as [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).
## Usage
This repository is not meant to be used directly. Instead, use one of the checkpoint
repositories that reference this base code:
- [AuriStream7B_40Pred_BigAudioDataset_500k](https://huggingface.co/TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k)
To load a checkpoint:
```python
from transformers import AutoModel, AutoConfig
model = AutoModel.from_pretrained(
"TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k",
trust_remote_code=True,
)
```
## Model Architecture
The AuriStream model includes:
- RMSNorm for layer normalization
- Rotary Position Embeddings (RoPE)
- SiLU activation in MLP layers
- Multi-token prediction heads
## Configuration Options
| Parameter | Description | Default |
|-----------|-------------|---------|
| `vocab_size` | Number of cochlear tokens | 8192 |
| `n_embd` | Hidden dimension | 768 |
| `n_layer` | Number of transformer layers | 12 |
| `n_head` | Number of attention heads | 12 |
| `n_pred_steps` | Number of prediction steps (MTP) | 1 |
## Files
- `configuration_auristream.py` - Configuration class
- `modeling_auristream.py` - Model implementation
## Tokenizer
This model uses cochlear tokens from [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).
|