abnerh's picture
Create README
33a849a verified
metadata
license: mit
tags:
  - audio
  - speech
  - phonology
  - wav2vec2
  - multilingual
  - pytorch-lightning
language:
  - en
  - es
  - de
  - cs
pipeline_tag: audio-classification

PhonoQ 2.0 – Multilingual

This repository hosts the multilingual checkpoint for PhonoQ 2.0, a modernized successor to the original PhonoQ system: https://github.com/TAriasVergara/PhonoQ

PhonoQ 2.0 outputs framewise probability distributions over phonological heads from raw speech audio, built on a self-supervised speech encoder (e.g., wav2vec 2.0 / HuBERT).

What this model outputs

Given an input audio file, the model produces framewise head probabilities for:

  • Manner (9 classes)
  • Vowel height (3 classes)
  • Vowel backness (3 classes)
  • Place of articulation (5 classes)
  • Voicing (2 classes)

Outputs are aligned to the encoder frame rate and returned as probabilities (not hard labels).

How to use

This checkpoint is intended to be used with the PhonoQ 2.0 inference code: https://github.com/abnerLing/PhonoQ-2.0

1) Install PhonoQ 2.0 (from GitHub)

Follow the installation instructions in the GitHub repository (PyTorch is required).

2) Download this checkpoint

wget https://huggingface.co/abnerh/phonoq-2.0-multilingual/resolve/main/best.ckpt