How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("feature-extraction", model="TuKoResearch/AuriStream7BDeep_40Pred_BigAudioDataset_500k-randinit", trust_remote_code=True)
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TuKoResearch/AuriStream7BDeep_40Pred_BigAudioDataset_500k-randinit", trust_remote_code=True, dtype="auto")
Quick Links

AuriStream7BDeep_40Pred_BigAudioDataset_500k-randinit

AuriStream is a speech language model by Greta Tuckute and Klemen Kotar.

This model predicts cochlear tokens from a tokenizer such as WavCochCausalV8192.

This repository contains a freshly initialized AuriStream7B40PredDeepConfig model. The weights are random and have not been trained from a checkpoint.

Model Details

Parameter Value
Parameters ~8.41B
Layers 96
Hidden Size 2560
Attention Heads 32
Vocab Size 8192
Prediction Steps 40

Usage

from transformers import AutoModel, AutoConfig

# Load with trust_remote_code for custom model
model = AutoModel.from_pretrained(
    "TuKoResearch/AuriStream7BDeep_40Pred_BigAudioDataset_500k-randinit",
    trust_remote_code=True,
)

# Or load config first
config = AutoConfig.from_pretrained("TuKoResearch/AuriStream7BDeep_40Pred_BigAudioDataset_500k-randinit", trust_remote_code=True)

Base Model Code

This checkpoint uses shared model code from TuKoResearch/AuriStream-base.

Tokenizer

This model uses cochlear tokens from WavCochCausalV8192.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support