--- license: apache-2.0 tags: - audio - speech - language-model - auristream library_name: transformers --- # AuriStream - Speech Language Model **AuriStream** is a speech language model by **Greta Tuckute** and **Klemen Kotar**. This repository contains the shared model code for AuriStream models. ## Overview AuriStream is a GPT-like transformer model for cochlear token prediction with optional multi-token prediction (MTP) heads. This model predicts cochlear tokens from a tokenizer such as [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192). ## Usage This repository is not meant to be used directly. Instead, use one of the checkpoint repositories that reference this base code: - [AuriStream7B_40Pred_BigAudioDataset_500k](https://huggingface.co/TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k) To load a checkpoint: ```python from transformers import AutoModel, AutoConfig model = AutoModel.from_pretrained( "TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k", trust_remote_code=True, ) ``` ## Model Architecture The AuriStream model includes: - RMSNorm for layer normalization - Rotary Position Embeddings (RoPE) - SiLU activation in MLP layers - Multi-token prediction heads ## Configuration Options | Parameter | Description | Default | |-----------|-------------|---------| | `vocab_size` | Number of cochlear tokens | 8192 | | `n_embd` | Hidden dimension | 768 | | `n_layer` | Number of transformer layers | 12 | | `n_head` | Number of attention heads | 12 | | `n_pred_steps` | Number of prediction steps (MTP) | 1 | ## Files - `configuration_auristream.py` - Configuration class - `modeling_auristream.py` - Model implementation ## Tokenizer This model uses cochlear tokens from [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).