AuriStream-base / README.md
klemenk's picture
Upload AuriStream base model code
9c3e596 verified
---
license: apache-2.0
tags:
- audio
- speech
- language-model
- auristream
library_name: transformers
---
# AuriStream - Speech Language Model
**AuriStream** is a speech language model by **Greta Tuckute** and **Klemen Kotar**.
This repository contains the shared model code for AuriStream models.
## Overview
AuriStream is a GPT-like transformer model for cochlear token prediction with optional
multi-token prediction (MTP) heads.
This model predicts cochlear tokens from a tokenizer such as [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).
## Usage
This repository is not meant to be used directly. Instead, use one of the checkpoint
repositories that reference this base code:
- [AuriStream7B_40Pred_BigAudioDataset_500k](https://huggingface.co/TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k)
To load a checkpoint:
```python
from transformers import AutoModel, AutoConfig
model = AutoModel.from_pretrained(
"TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k",
trust_remote_code=True,
)
```
## Model Architecture
The AuriStream model includes:
- RMSNorm for layer normalization
- Rotary Position Embeddings (RoPE)
- SiLU activation in MLP layers
- Multi-token prediction heads
## Configuration Options
| Parameter | Description | Default |
|-----------|-------------|---------|
| `vocab_size` | Number of cochlear tokens | 8192 |
| `n_embd` | Hidden dimension | 768 |
| `n_layer` | Number of transformer layers | 12 |
| `n_head` | Number of attention heads | 12 |
| `n_pred_steps` | Number of prediction steps (MTP) | 1 |
## Files
- `configuration_auristream.py` - Configuration class
- `modeling_auristream.py` - Model implementation
## Tokenizer
This model uses cochlear tokens from [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).