TuKoResearch
/

AuriStream-base

Model card Files Files and versions

AuriStream-base / README.md

klemenk's picture

Upload AuriStream base model code

9c3e596 verified about 2 months ago

|

history blame contribute delete

1.84 kB

	---
	license: apache-2.0
	tags:
	- audio
	- speech
	- language-model
	- auristream
	library_name: transformers
	---

	# AuriStream - Speech Language Model

	AuriStream is a speech language model by Greta Tuckute and Klemen Kotar.

	This repository contains the shared model code for AuriStream models.

	## Overview

	AuriStream is a GPT-like transformer model for cochlear token prediction with optional
	multi-token prediction (MTP) heads.

	This model predicts cochlear tokens from a tokenizer such as [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).

	## Usage

	This repository is not meant to be used directly. Instead, use one of the checkpoint
	repositories that reference this base code:

	- [AuriStream7B_40Pred_BigAudioDataset_500k](https://huggingface.co/TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k)

	To load a checkpoint:

	```python
	from transformers import AutoModel, AutoConfig

	model = AutoModel.from_pretrained(
	"TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k",
	trust_remote_code=True,
	)
	```

	## Model Architecture

	The AuriStream model includes:
	- RMSNorm for layer normalization
	- Rotary Position Embeddings (RoPE)
	- SiLU activation in MLP layers
	- Multi-token prediction heads

	## Configuration Options

	\| Parameter \| Description \| Default \|
	\|-----------\|-------------\|---------\|
	\| `vocab_size` \| Number of cochlear tokens \| 8192 \|
	\| `n_embd` \| Hidden dimension \| 768 \|
	\| `n_layer` \| Number of transformer layers \| 12 \|
	\| `n_head` \| Number of attention heads \| 12 \|
	\| `n_pred_steps` \| Number of prediction steps (MTP) \| 1 \|

	## Files

	- `configuration_auristream.py` - Configuration class
	- `modeling_auristream.py` - Model implementation

	## Tokenizer

	This model uses cochlear tokens from [WavCochCausalV8192](https://huggingface.co/TuKoResearch/WavCochCausalV8192).