You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🌊 FlexAligner: Robust Speech-Text Alignment Framework

FlexAligner is a robust two-stage speech-text alignment framework designed specifically for "non-ideal" real-world acoustic data. FlexAligner 是一种强鲁棒性的两阶段语音-文本对齐框架,专门为“非理想”的真实世界语音数据设计。


🌟 Key Features / 核心功能

English

  • Robustness to Mismatched Data: Unlike traditional MFA (Montreal Forced Aligner), FlexAligner automatically identifies and skips mismatched segments (e.g., laughter, background noise, or un-transcribed words), preventing cumulative errors.
  • Two-Stage Architecture:
    1. Stage 1 (CTC Chunking): Macro-segmentation to locate reliable "speech islands."
    2. Stage 2 (CE Alignment): Micro-alignment using dynamic hop calibration for sub-millisecond boundary regression.
  • Eliminating Temporal Drift: Includes a self-calibrating decoding algorithm for long recordings, ensuring phoneme boundaries at the end of the file remain strictly aligned with samples.

中文

  • 处理不匹配数据:不同于传统的 MFA (Montreal Forced Aligner),FlexAligner 能够自动识别并跳过音频与文本不匹配的部分(如笑声、长时间噪音或漏记的单词),而不会产生累积误差。
  • 两阶段对齐架构
    1. Stage 1 (CTC Chunking): 宏观切分,定位可靠的语音“岛屿”。
    2. Stage 2 (CE Alignment): 微观对齐,利用动态步长校准实现亚毫秒级的边界回归。
  • 消除时间漂移:针对长音频设计了自校准解码算法,确保音频末尾的音素边界依然能够与采样点严格对齐。

📦 Model Components / 模型组成

This repository contains the weights for the two core components of FlexAligner: 本仓库包含 FlexAligner 运行所需的两套核心权重:

  • hf_phs/: Wav2Vec 2.0 based CTC chunking model. / 基于 Wav2Vec 2.0 训练的 CTC 切分模型。
  • ce2/: High-precision frame-level Cross-Entropy alignment model. / 高精度帧级别交叉熵对齐模型。

🚀 Quick Start / 快速上手

CLI Usage / 命令行使用

After installation, you can call the cloud models directly via the CLI: 安装后,你可以通过 CLI 命令行直接调用云端模型:

flex-align input.wav transcript.txt --dynamic -o output.TextGrid

Python API Or integrate it into your Python pipeline: / 或者在 Python 代码中集成:

from flexaligner import FlexAligner

# Use the Hugging Face Repo ID to automatically download and load weights
# 填入本仓库 ID,程序会自动处理模型下载与加载
aligner = FlexAligner(config={
    "chunk_model_path": "USTCPhonetics/FlexAligner",
    "use_dynamic_hop": True
})

aligner.align("test.wav", "test.txt", "result.TextGrid")

📜 License / 协议 This project is licensed under the MIT License. Feel free to use it in academic research or commercial projects. 本项目遵循 MIT License。你可以自由地在学术研究或商业项目中使用。


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support