File size: 1,836 Bytes
9c3e596
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: apache-2.0
tags:
- audio
- speech
- language-model
- auristream
library_name: transformers
---

# AuriStream - Speech Language Model

**AuriStream** is a speech language model by **Greta Tuckute** and **Klemen Kotar**.

This repository contains the shared model code for AuriStream models.

## Overview

AuriStream is a GPT-like transformer model for cochlear token prediction with optional
multi-token prediction (MTP) heads.

This model predicts cochlear tokens from a tokenizer such as [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).

## Usage

This repository is not meant to be used directly. Instead, use one of the checkpoint
repositories that reference this base code:

- [AuriStream7B_40Pred_BigAudioDataset_500k](https://huggingface.co/TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k)

To load a checkpoint:

```python
from transformers import AutoModel, AutoConfig

model = AutoModel.from_pretrained(
    "TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k",
    trust_remote_code=True,
)
```

## Model Architecture

The AuriStream model includes:
- RMSNorm for layer normalization
- Rotary Position Embeddings (RoPE)
- SiLU activation in MLP layers
- Multi-token prediction heads

## Configuration Options

| Parameter | Description | Default |
|-----------|-------------|---------|
| `vocab_size` | Number of cochlear tokens | 8192 |
| `n_embd` | Hidden dimension | 768 |
| `n_layer` | Number of transformer layers | 12 |
| `n_head` | Number of attention heads | 12 |
| `n_pred_steps` | Number of prediction steps (MTP) | 1 |

## Files

- `configuration_auristream.py` - Configuration class
- `modeling_auristream.py` - Model implementation

## Tokenizer

This model uses cochlear tokens from [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).