---
tags:
- audiotranscription-segmentation
- radio
- radio-live-a-la-carte
- rlac
- random-forest
- transcription
model_index:
- name: RLAC Audio-transcription Segmenter - Chroniques (Random Forest)
---

# RLAC Audio-transcription Segmenter - Chroniques (Random Forest)

## Description
This model uses a **Random Forest** machine learning approach to detect radio chronicle segments from textual transcriptions (SRT files). It is designed to be a lightweight and efficient alternative for text-based segmentation.

Hugging Face link: [eglantinefonrose/rlac-audiotranscript-segmenter-chroniques-randomforest](https://huggingface.co/eglantinefonrose/rlac-audiotranscript-segmenter-chroniques-randomforest)

## Model Details

The model is a single-file classifier (`.pkl`) serialized with `joblib`. It operates by analyzing individual transcription segments and their immediate context.

### 1. Feature Engineering
The model extracts several key features from the SRT segments:
*   **Temporal Data**: Precise air time and segment duration.
*   **Linguistic Statistics**: Word count, character count, and average word length.
*   **Punctuation Analysis**: Detection of question marks and exclamation points to identify rhetorical styles.
*   **TF-IDF Vectorization**: A statistical measure used to evaluate the importance of words in the transcript relative to radio chronicle vocabulary.
*   **Jingle Detection**: Specific binary markers for identified radio jingles.

### 2. Contextual Awareness (Sliding Window)
To improve accuracy, the model employs a **sliding window** technique. It doesn't just look at a segment in isolation; it incorporates features from the surrounding segments (e.g., the 2 segments before and 2 segments after) to capture the flow and continuity of the radio broadcast.

### 3. Core Classifier
The underlying algorithm is a **RandomForestClassifier** from the Scikit-Learn library, optimized with 200 estimators to handle the balance between precision and recall for chronicle detection.

## Advantages
*   **CPU-Only**: Does not require any GPU or heavy deep learning frameworks.
*   **High Speed**: Processing an hour-long show takes less than a second.
*   **Efficiency**: Extremely low memory footprint.

## Usage
The model is integrated into the local `predict.py` workflow:
```python
from train import RadioChroniqueClassifier
classifier = RadioChroniqueClassifier.load_model("models/radio_chronique_rf.pkl")
```

## Author
Maintained by eglantinefonrose.