Your Name
fine v.1.0
f5bce42

VALIDATOR

Last updated: 2026-03-09

Purpose

Performs comprehensive pre-flight validation of audio and script files before forced alignment processing. Ensures files exist, are properly formatted, and have realistic word count to duration ratios for Tunisian Arabic content.

Function Signature

def validate_inputs(audio_path: Union[str, Path], script_path: Union[str, Path]) -> Dict:

Parameters

Param Type Required Default Description
audio_path Union[str, Path] Yes - Path to audio file for validation
script_path Union[str, Path] Yes - Path to script text file for validation

Returns

Dictionary with validation results and warnings:

{
    "audio_duration_sec": 23.5,
    "sentence_count": 4,
    "word_count": 58,
    "warnings": ["Script may be too short for audio duration..."]
}

Error Handling

Exception Condition
FileNotFoundError Audio or script file doesn't exist
ValueError File is empty, script not UTF-8, or no valid content
RuntimeError ffprobe fails or can't analyze audio duration

Usage Example

from validator import validate_inputs

result = validate_inputs("input/video.mp3", "input/video.txt")
print(f"Duration: {result['audio_duration_sec']}s")
print(f"Sentences: {result['sentence_count']}")
for warning in result['warnings']:
    print(f"⚠️ {warning}")

Known Edge Cases

  • Mixed Arabic/French script: Word counting handles code-switching by splitting on whitespace
  • Empty lines in script: Automatically filtered out, only non-empty lines count as sentences
  • Special characters: Preserved as-is, no normalization or filtering applied
  • Very short audio: Duration validation may trigger false positives for audio < 5 seconds
  • Corrupted audio: ffprobe will fail with descriptive error message
  • Non-UTF8 script: Explicit check prevents garbled Arabic text processing

Dependencies

  • ffprobe (part of ffmpeg): System requirement for audio duration analysis
  • pathlib: Built-in Python module
  • subprocess: Built-in Python module
  • re: Built-in Python module for text processing