Gilbert-AI
/

gilbert-fr-source

@@ -24,108 +24,172 @@ tags:
 - research
 - gilbert
 ---
-# Gilbert-FR-Source
-`Gilbert-FR-Source` est un modèle de transcription automatique de la parole (ASR) en langue française, utilisé comme modèle de base (backbone) pour les travaux de recherche et développement menés autour de la plateforme Gilbert. Il sert de fondation pour l’exploration de nouvelles variantes spécialisées, notamment pour les environnements professionnels, les réunions multi-locuteurs, la parole spontanée, les accents régionaux et la téléphonie large bande ou bas débit.
-L’objectif principal de ce modèle est de fournir une base stable, performante et reproductible pour l’ensemble des expérimentations subséquentes (fine-tuning, adaptation domaine, optimisation des performances et de la latence).
 ---
-## 1. Objectif et utilisation
-`Gilbert-FR-Source` constitue le modèle de référence utilisé en interne pour :
-- l’évaluation comparative de pipelines ASR ;
-- les études d’adaptation domaine en conditions réelles (réunions, visios, environnements bruités) ;
-- les travaux de robustification de la transcription sur des accents ou profils vocaux diversifiés ;
-- la préparation de variantes optimisées (long-form, accents, téléphonie) ;
-- la mise en place de benchmarks et d’outils de mesure de performance interne.
-Ce modèle n’est pas une version fine-tunée, mais une base de recherche préparée pour la création de futures versions spécialisées.
 ---
-## 2. Performances de référence (benchmarks publics)
-Les résultats suivants constituent des performances observées sur des jeux de données publics fréquemment utilisés dans l'évaluation des systèmes ASR :
-| Jeu de données | WER |
-|----------------|-----|
 | MLS (FR) | 3.98 % |
 | Common Voice FR (v13.0) | 7.28 % |
 | VoxPopuli (FR) | 8.91 % |
 | Fleurs (FR) | 4.84 % |
 | African Accented French | 4.20 % |
-Ces valeurs servent uniquement de référence et constituent un point de départ pour les futures variantes optimisées du modèle (long-form, accents, téléphonie). Elles permettent de situer les performances sur de la lecture, de la parole semi-libre, de la parole politique ou institutionnelle, et des accents variés.
 ---
-## 3. Architecture
-Le modèle repose sur l’architecture Whisper Large V3.
-Caractéristiques principales :
-- modèle encodeur-décodeur multilingue ;
-- capacité à modéliser des séquences longues ;
-- pré-entraînement sur de larges corpus multilingues ;
-- forte spécialisation implicite en français observée dans les benchmarks publics ;
-- compatibilité avec les runtimes optimisés (CTranslate2, ONNX Runtime, MLX).
-Le modèle est particulièrement adapté aux tâches de transcription longue, multilingue et à fort besoin de stabilité syntaxique.
 ---
-## 4. Données et entraînement
-Ce modèle n’a pas été réentraîné dans cette version : il est utilisé tel que, comme base de recherche.
-Les futures versions spécialiséées pourront inclure :
-- du fine-tuning sur des corpus internes de réunions professionnelles ;
-- de l’adaptation domaine pour des contextes spécifiques (enseignement supérieur, santé, administration, finance) ;
-- de la robustification sur conditions difficiles (téléphonie 8 kHz, micros dégradés, bruit ambiant) ;
-- de l’amélioration spécifique sur les accents variés.
 ---
-## 5. Usages recommandés
-- transcription française standard ;
-- comparaison de pipelines ASR ;
-- prototypage et recherche ;
-- mesure de qualité et mise en place de benchmarks internes ;
-- base pour l’adaptation domaine.
 ---
-## 6. Licence et conformité
-Ce dépôt contient des fichiers publiés sous licence MIT.
-Conformément à la licence MIT :
-> Une copie de la licence est fournie dans ce dépôt.
-> Certains fichiers inclus ont été initialement publiés sous licence MIT.
-Toutes les futures versions fine-tunées ou adaptées seront la propriété de Lexia France.
 ---
-## 7. Versions futures prévues
-- Gilbert-FR-Longform-v1 (parole longue, réunions et discours)
-- Gilbert-FR-Accents-v1 (accents régionaux et internationaux)
-- Gilbert-FR-Téléphone-v1 (8 kHz, call center, voix compressée)
-- Gilbert-Multilingue-v1 (extension multi-langue)
-Ces versions feront l’objet d’évaluations systématiques sur les jeux de données publics et internes.
 ---
-## 8. Contact
-Pour toute question, collaboration ou demande d’évaluation :
-- Site : https://gilbert-assistant.fr
-- Contact : mathis@lexiapro.fr

 - research
 - gilbert
 ---
+# Gilbert-FR-Source — Research Baseline for French Automatic Speech Recognition
+`Gilbert-FR-Source` is a French automatic speech recognition (ASR) model used as the **research foundation** for the Gilbert project.
+It is designed as an internal scientific baseline enabling controlled experimentation, reproducible evaluation, and rigorous comparison across ASR architectures, datasets, and adaptation methods.
+This model is not a fine-tuned derivative, but a **curated research anchor** used to support systematic studies in:
+- domain adaptation,
+- robustness to spontaneous and long-form speech,
+- accented and low-resource linguistic profiles,
+- telephony and bandwidth-constrained speech,
+- multi-speaker and meeting transcription.
 ---
+## 1. Research Motivation
+The Gilbert project aims to build highly specialized ASR systems optimized for:
+- professional meeting transcription (hybrid/remote),
+- long-form multi-speaker discourse,
+- institutional environments (education, public sector),
+- constrained audio conditions (telephony, VoIP, low SNR),
+- sociolinguistic diversity (African, Canadian, Belgian and other French accents).
+While Whisper Large V3 provides strong baseline performance, its behavior under domain shifts (spontaneous interactions, overlapping speech, degraded microphones) requires systematic study.
+`Gilbert-FR-Source` provides the **frozen starting point** for this line of research, ensuring controlled comparisons between experiments.
 ---
+## 2. Scientific Goals and Research Questions
+This model is used to answer a series of research questions:
+### **Q1. Long-form modeling**
+How does Whisper-L3 behave on meetings lasting 30–120 minutes, with natural topic shifts, interruptions, and pragmatic markers?
+### **Q2. Accent robustness**
+Which classes of French accents induce the strongest WER degradation?
+How does robustness vary across FLEURS, African French, and Common Voice subsets?
+### **Q3. Telephony adaptation**
+What is the degradation curve when downsampling to 16 kHz / 8 kHz / μ-law compressed audio?
+### **Q4. Domain adaptation efficiency**
+What is the marginal gain of targeted fine-tuning on professional meeting datasets (education, administration, healthcare)?
+### **Q5. Multilingual side-effects**
+To what extent does French fine-tuning affect cross-lingual generalization?
+These research axes structure the development of future specialized Gilbert models.
+---
+## 3. Benchmark Reference Results
+The following WER scores originate from established open benchmarks and serve as a *reference baseline* for future experiments:
+| Dataset | WER |
+|--------|-----|
 | MLS (FR) | 3.98 % |
 | Common Voice FR (v13.0) | 7.28 % |
 | VoxPopuli (FR) | 8.91 % |
 | Fleurs (FR) | 4.84 % |
 | African Accented French | 4.20 % |
+These results provide **upper bounds** before targeted fine-tuning.
+Future Gilbert variants will be evaluated using:
+- internal meeting datasets,
+- domain-specific corpora (administration, higher education, healthcare),
+- accented speech corpora,
+- telephony datasets,
+- long-form evaluation methods (> 1 hour audio).
 ---
+## 4. Architecture
+The model is based on the **Whisper Large V3** encoder–decoder architecture, offering:
+- large multilingual pretraining,
+- long-context modeling capacity,
+- robust cross-lingual alignment,
+- stable decoding for long outputs,
+- strong zero-shot performance on French.
+It is compatible with:
+- Hugging Face Transformers,
+- CTranslate2,
+- ONNX Runtime,
+- MLX (Apple Silicon),
+- quantization-based acceleration pipelines.
 ---
+## 5. Methodology and Reproducibility
+`Gilbert-FR-Source` is used in strict research settings emphasizing:
+### **Reproducible training protocols**
+- frozen weights for baseline comparison,
+- controlled hyperparameter schedules,
+- consistent evaluation datasets,
+- deterministic decoding configurations.
+### **Evaluation methodology**
+WER is computed with standard normalization (lowercasing, punctuation removal).
+More advanced metrics (diarization error rate, long-context drift) are included in internal research pipelines.
+### **Versioning policy**
+This repository represents version `0.1` of the research baseline.
+All future fine-tuned models will explicitly reference this version for traceability.
 ---
+## 6. Limitations
+This baseline inherits the known limitations of Whisper and of the underlying datasets:
+- sensitivity to overlapping speech,
+- occasional hallucinations in long-form decoding,
+- domain shift on spontaneous dialogue,
+- potential biases related to accent distribution in training data,
+- suboptimal performance in telephony bandwidth.
+Understanding and quantifying these limitations is one of the core objectives of the Gilbert research roadmap.
 ---
+## 7. Future Work (Planned Research Directions)
+The following models will be developed as independent checkpoints:
+- **Gilbert-FR-Longform-v1**
+  Long meetings, multi-speaker interaction, discourse-level context stability.
+- **Gilbert-FR-Accents-v1**
+  Robustness to regional and international French accents.
+- **Gilbert-FR-Telephone-v1**
+  Optimized for 8 kHz VoIP/call-center speech.
+- **Gilbert-Multilingual-v1**
+  Extended cross-lingual performance with optimized French anchors.
+Each model will include detailed evaluation reports and will adhere to research reproducibility standards.
 ---
+## 8. License
+This repository includes files distributed under the MIT License.
+> A copy of the MIT License is included.
+> Some files were originally released under MIT.
+All future Gilbert models built on top of this baseline are the exclusive property of Lexia France.
 ---
+## 9. Contact
+For research collaboration, evaluation access, or technical inquiries:
+- Website: https://gilbert-assistant.fr
+- Email: mathis@lexiapro.fr