Instructions to use okuparinen/SKN_300m_simple with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use okuparinen/SKN_300m_simple with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="okuparinen/SKN_300m_simple")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("okuparinen/SKN_300m_simple") model = AutoModelForCTC.from_pretrained("okuparinen/SKN_300m_simple") - Notebooks
- Google Colab
- Kaggle
Simple automatic dialectal transcription of Finnish
This is a fine-tuned model for automatic dialectal transcription of Finnish dialect recordings. The model is based on a model trained on colloquial Finnish: GetmanY1/wav2vec2-large-fi-lp-cont-pt. The model has been finetuned on old Finnish dialect recordings and their corresponding transcriptions in the Uralic Phonetic Alphabet. This model outputs simple transcription. The audio recordings are sampled at 16kHz.
Uses
You can use this model for automatic dialectal transcription of Finnish dialects. Note that this model does not produce standard Finnish text.
How to Get Started with the Model
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC, Wav2Vec2CTCTokenizer
from datasets import Dataset, Audio
import torch
import pandas as pd
ds = pd.read_csv('CSV_DATA.csv')
ds = ds.dropna(how='any', axis=0)
test = Dataset.from_pandas(skn_test)
test = test.cast_column("AUDIO_PATH_COLUMN", Audio(sampling_rate=16000))
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("okuparinen/SKN_300m_simple", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
model = Wav2Vec2ForCTC.from_pretrained("okuparinen/SKN_300m_simple").to("cuda")
processor = Wav2Vec2Processor.from_pretrained("okuparinen/SKN_300m_simple", tokenizer=tokenizer)
def prepare_dataset(batch):
audio = batch["AUDIO_PATH"]
batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
batch["input_length"] = len(batch["input_values"])
return batch
test_ready = test.map(prepare_dataset, remove_columns=test.column_names)
length = len(test)
predictions = []
for i in range(0, length, 1):
input_dict = processor(test_ready[i]["input_values"], return_tensors="pt", padding=True)
logits = model(input_dict.input_values.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)[0]
prediction = processor.decode(pred_ids)
predictions.append(prediction)
with open("OUTFILE.txt", "w") as f_pred:
for line in predictions:
f_pred.write(line + '\n')
Training Data
The training data is an utterance-level version of the Samples of Spoken Finnish corpus. The utterance-level version is available at okuparinen/skn.
Evaluation results
TBA
Citation [optional]
BibTeX:
[More Information Needed]
- Downloads last month
- 3
Model tree for okuparinen/SKN_300m_simple
Base model
GetmanY1/wav2vec2-large-fi-lp-cont-pt