Gilbert-AI
/

gilbert-fr-source

@@ -1,195 +1,300 @@
 ---
 license: mit
-datasets:
-- google/fleurs
-- facebook/voxpopuli
-- facebook/multilingual_librispeech
-- mozilla-foundation/common_voice_13_0
-- mozilla-foundation/common_voice_17_0
-language:
-- fr
-- en
-metrics:
-- wer
-base_model:
-- openai/whisper-large-v3
-pipeline_tag: automatic-speech-recognition
-library_name: transformers
 tags:
-- speech-recognition
 - whisper
 - french
 - stt
 - multilingual
 - research
-- gilbert
 ---
 # Gilbert-FR-Source — Research Baseline for French Automatic Speech Recognition
-`Gilbert-FR-Source` is a French automatic speech recognition (ASR) model used as the **research foundation** for the Gilbert project.
-It is designed as an internal scientific baseline enabling controlled experimentation, reproducible evaluation, and rigorous comparison across ASR architectures, datasets, and adaptation methods.
-This model is not a fine-tuned derivative, but a **curated research anchor** used to support systematic studies in:
-- domain adaptation,
-- robustness to spontaneous and long-form speech,
-- accented and low-resource linguistic profiles,
-- telephony and bandwidth-constrained speech,
-- multi-speaker and meeting transcription.
 ---
-## 1. Research Motivation
-The Gilbert project aims to build highly specialized ASR systems optimized for:
-- professional meeting transcription (hybrid/remote),
-- long-form multi-speaker discourse,
-- institutional environments (education, public sector),
-- constrained audio conditions (telephony, VoIP, low SNR),
-- sociolinguistic diversity (African, Canadian, Belgian and other French accents).
-While Whisper Large V3 provides strong baseline performance, its behavior under domain shifts (spontaneous interactions, overlapping speech, degraded microphones) requires systematic study.
-`Gilbert-FR-Source` provides the **frozen starting point** for this line of research, ensuring controlled comparisons between experiments.
 ---
-## 2. Scientific Goals and Research Questions
-This model is used to answer a series of research questions:
-### **Q1. Long-form modeling**
-How does Whisper-L3 behave on meetings lasting 30–120 minutes, with natural topic shifts, interruptions, and pragmatic markers?
-### **Q2. Accent robustness**
-Which classes of French accents induce the strongest WER degradation?
-How does robustness vary across FLEURS, African French, and Common Voice subsets?
-### **Q3. Telephony adaptation**
-What is the degradation curve when downsampling to 16 kHz / 8 kHz / μ-law compressed audio?
-### **Q4. Domain adaptation efficiency**
-What is the marginal gain of targeted fine-tuning on professional meeting datasets (education, administration, healthcare)?
-### **Q5. Multilingual side-effects**
-To what extent does French fine-tuning affect cross-lingual generalization?
-These research axes structure the development of future specialized Gilbert models.
 ---
-## 3. Benchmark Reference Results
-The following WER scores originate from established open benchmarks and serve as a *reference baseline* for future experiments:
-| Dataset | WER |
-|--------|-----|
-| MLS (FR) | 3.98 % |
-| Common Voice FR (v13.0) | 7.28 % |
-| VoxPopuli (FR) | 8.91 % |
-| Fleurs (FR) | 4.84 % |
-| African Accented French | 4.20 % |
-These results provide **upper bounds** before targeted fine-tuning.
-Future Gilbert variants will be evaluated using:
-- internal meeting datasets,
-- domain-specific corpora (administration, higher education, healthcare),
-- accented speech corpora,
-- telephony datasets,
-- long-form evaluation methods (> 1 hour audio).
 ---
-## 4. Architecture
-The model is based on the **Whisper Large V3** encoder–decoder architecture, offering:
-- large multilingual pretraining,
-- long-context modeling capacity,
-- robust cross-lingual alignment,
-- stable decoding for long outputs,
-- strong zero-shot performance on French.
-It is compatible with:
-- Hugging Face Transformers,
-- CTranslate2,
-- ONNX Runtime,
-- MLX (Apple Silicon),
-- quantization-based acceleration pipelines.
 ---
-## 5. Methodology and Reproducibility
-`Gilbert-FR-Source` is used in strict research settings emphasizing:
-### **Reproducible training protocols**
-- frozen weights for baseline comparison,
-- controlled hyperparameter schedules,
-- consistent evaluation datasets,
-- deterministic decoding configurations.
-### **Evaluation methodology**
-WER is computed with standard normalization (lowercasing, punctuation removal).
-More advanced metrics (diarization error rate, long-context drift) are included in internal research pipelines.
-### **Versioning policy**
-This repository represents version `0.1` of the research baseline.
-All future fine-tuned models will explicitly reference this version for traceability.
 ---
-## 6. Limitations
-This baseline inherits the known limitations of Whisper and of the underlying datasets:
-- sensitivity to overlapping speech,
-- occasional hallucinations in long-form decoding,
-- domain shift on spontaneous dialogue,
-- potential biases related to accent distribution in training data,
-- suboptimal performance in telephony bandwidth.
-Understanding and quantifying these limitations is one of the core objectives of the Gilbert research roadmap.
 ---
-## 7. Future Work (Planned Research Directions)
-The following models will be developed as independent checkpoints:
-- **Gilbert-FR-Longform-v1**
-  Long meetings, multi-speaker interaction, discourse-level context stability.
-- **Gilbert-FR-Accents-v1**
-  Robustness to regional and international French accents.
-- **Gilbert-FR-Telephone-v1**
-  Optimized for 8 kHz VoIP/call-center speech.
-- **Gilbert-Multilingual-v1**
-  Extended cross-lingual performance with optimized French anchors.
-Each model will include detailed evaluation reports and will adhere to research reproducibility standards.
 ---
-## 8. License
-This repository includes files distributed under the MIT License.
-> A copy of the MIT License is included.
-> Some files were originally released under MIT.
-All future Gilbert models built on top of this baseline are the exclusive property of Lexia France.
 ---
-## 9. Contact
 For research collaboration, evaluation access, or technical inquiries:
-- Website: https://gilbert-assistant.fr
-- Email: mathis@lexiapro.fr

 ---
 license: mit
 tags:
+- automatic-speech-recognition
+- asr
 - whisper
 - french
+- speech-recognition
 - stt
 - multilingual
 - research
+- baseline
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+base_model: openai/whisper-large-v3
 ---
 # Gilbert-FR-Source — Research Baseline for French Automatic Speech Recognition
+## Overview
+**Gilbert-FR-Source** is the foundational baseline model for the **Gilbert research project**, a comprehensive initiative focused on developing state-of-the-art automatic speech recognition (ASR) systems optimized for French language applications. This model serves as the **frozen reference point** for all subsequent research, fine-tuning, and development work within the Gilbert ecosystem.
+**Important Notice on Intellectual Property:**
+- This baseline model (`MEscriva/gilbert-fr-source`) is distributed under the MIT License, allowing research and commercial use.
+- **All derivative models, fine-tuned variants, and specialized models developed from this baseline as part of the Gilbert project are the exclusive intellectual property of Lexia France.**
+- While this baseline can be used freely under MIT terms, any models built upon it for the Gilbert project are proprietary and subject to separate licensing terms.
 ---
+## Research Context
+The Gilbert project is a systematic research and development effort aimed at creating highly specialized ASR systems for:
+- **Professional meeting transcription** (hybrid and remote meetings)
+- **Long-form multi-speaker discourse** (30-120 minute sessions)
+- **Institutional environments** (education, public sector, healthcare)
+- **Constrained audio conditions** (telephony, VoIP, low signal-to-noise ratio)
+- **Sociolinguistic diversity** (African, Canadian, Belgian, and other French accents)
+This baseline model provides the **controlled starting point** for all experimental work, ensuring reproducibility and enabling fair comparison across different research directions.
 ---
+## Model Details
+### Architecture
+- **Base Model:** OpenAI Whisper Large V3
+- **Fine-tuning:** Optimized for French language performance
+- **Framework:** Compatible with Hugging Face Transformers, OpenAI Whisper, CTranslate2, ONNX Runtime, and MLX
+- **Model Size:** ~3.2 GB (full precision)
+### Key Characteristics
+- **Language:** French (primary), with multilingual capabilities
+- **Context Length:** Long-form audio support (up to 30 minutes per segment)
+- **Output:** Text transcription with word-level timestamps
+- **Performance:** Optimized for French speech recognition accuracy
+---
+## Intended Use
+### Research and Development
+This model is intended for:
+1. **Research Baseline:** Use as a reference point for ASR research and experimentation
+2. **Comparative Studies:** Benchmark against this baseline when evaluating new architectures or training strategies
+3. **Fine-tuning Foundation:** Use as a starting point for domain-specific fine-tuning (subject to Gilbert project IP terms)
+4. **Educational Purposes:** Learning and understanding ASR model behavior
+### Production Use
+While this baseline model can be used directly, **production deployments should use specialized Gilbert models** that are optimized for specific use cases and domains. Contact the Gilbert team for production-grade models.
 ---
+## Performance Benchmarks
+### Reference Results
+The following WER (Word Error Rate) scores serve as **baseline reference** for future Gilbert model development:
+| Dataset | WER | Notes |
+|---------|-----|-------|
+| MLS (FR) | 3.98% | Multilingual LibriSpeech French |
+| Common Voice FR (v13.0) | 7.28% | Diverse French speech |
+| VoxPopuli (FR) | 8.91% | European Parliament speeches |
+| Fleurs (FR) | 4.84% | FLORES evaluation |
+| African Accented French | 4.20% | Regional accent evaluation |
+**Note:** These results represent the **upper bound** before targeted fine-tuning. Future Gilbert variants will be evaluated against these baselines to measure improvement.
 ---
+## Usage
+### Installation
+```bash
+pip install transformers torch torchaudio librosa soundfile
+```
+### Basic Usage with Transformers
+```python
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
+import torch
+model_id = "MEscriva/gilbert-fr-source"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+torch_dtype = torch.float16 if device == "cuda" else torch.float32
+processor = AutoProcessor.from_pretrained(model_id)
+model = AutoModelForSpeechSeq2Seq.from_pretrained(
+    model_id,
+    torch_dtype=torch_dtype,
+    low_cpu_mem_usage=True
+)
+model.to(device)
+# Process audio
+audio_path = "your_audio.wav"
+inputs = processor(audio_path, return_tensors="pt", sampling_rate=16000)
+inputs = {k: v.to(device) for k, v in inputs.items()}
+with torch.no_grad():
+    generated_ids = model.generate(
+        inputs["input_features"],
+        language="fr",
+        task="transcribe"
+    )
+transcription = processor.batch_decode(
+    generated_ids,
+    skip_special_tokens=True
+)[0]
+```
+### Usage with OpenAI Whisper
+```python
+import whisper
+# Load the model
+model = whisper.load_model("large-v3")
+# Transcribe French audio
+result = model.transcribe(
+    "audio.wav",
+    language="fr",
+    task="transcribe"
+)
+print(result["text"])
+```
 ---
+## Research Methodology
+### Baseline Purpose
+This model serves as:
+1. **Frozen Reference:** Weights remain unchanged to ensure consistent baseline comparisons
+2. **Reproducibility Anchor:** All experiments reference this exact checkpoint
+3. **Version Control:** Future Gilbert models explicitly reference this baseline version for traceability
+### Evaluation Standards
+- **WER Calculation:** Standard normalization (lowercasing, punctuation removal)
+- **Metrics:** Word Error Rate (WER), Character Error Rate (CER), BLEU score
+- **Advanced Metrics:** Speaker-attributed WER (SA-WER), long-context stability (internal research)
+### Versioning
+- **Current Version:** 0.1 (Research Baseline)
+- **Future Versions:** All Gilbert model variants will reference this baseline version
 ---
+## Limitations
+This baseline model inherits known limitations from Whisper and the underlying training data:
+1. **Overlapping Speech:** Sensitivity to simultaneous speakers
+2. **Long-form Decoding:** Occasional hallucinations in very long audio segments
+3. **Domain Shift:** Suboptimal performance on spontaneous dialogue without fine-tuning
+4. **Accent Distribution:** Potential biases related to accent representation in training data
+5. **Telephony Bandwidth:** Suboptimal performance on narrowband (8 kHz) audio without adaptation
+**Understanding and quantifying these limitations is a core objective of the Gilbert research roadmap.**
 ---
+## Future Research Directions
+The following specialized models will be developed as independent checkpoints from this baseline:
+### Planned Gilbert Models
+1. **Gilbert-FR-Longform-v1**
+   - Optimized for long meetings (30-120 minutes)
+   - Multi-speaker interaction handling
+   - Discourse-level context stability
+2. **Gilbert-FR-Accents-v1**
+   - Robustness to regional and international French accents
+   - African, Canadian, Belgian accent optimization
+3. **Gilbert-FR-Telephone-v1**
+   - Optimized for 8 kHz VoIP/call-center speech
+   - Narrowband audio adaptation
+4. **Gilbert-Multilingual-v1**
+   - Extended cross-lingual performance
+   - Optimized French anchors with multilingual support
+**All future Gilbert models are the exclusive intellectual property of Lexia France** and will include detailed evaluation reports adhering to research reproducibility standards.
+---
+## Intellectual Property and Licensing
+### License for This Baseline
+This baseline model (`MEscriva/gilbert-fr-source`) is distributed under the **MIT License**, allowing:
+- ✅ Commercial use
+- ✅ Modification
+- ✅ Distribution
+- ✅ Private use
+- ✅ Patent use
+See the `LICENSE` file for full terms.
+### Intellectual Property Notice
+**Important:** While this baseline model is available under MIT License:
+- **All derivative models, fine-tuned variants, and specialized models developed as part of the Gilbert project are the exclusive intellectual property of Lexia France.**
+- Use of this baseline for Gilbert project development implies acceptance of these IP terms.
+- Commercial use of Gilbert project derivatives requires separate licensing agreements.
+For licensing inquiries regarding Gilbert project models, contact: **mathis@lexiapro.fr**
 ---
+## Citation
+If you use this baseline model in your research, please cite:
+```bibtex
+@software{gilbert_fr_source_2024,
+  title={Gilbert-FR-Source: Research Baseline for French Automatic Speech Recognition},
+  author={MEscriva and Lexia France},
+  year={2024},
+  url={https://huggingface.co/MEscriva/gilbert-fr-source},
+  version={0.1},
+  note={Research baseline for the Gilbert project}
+}
+```
+---
+## Acknowledgments
+This baseline model is based on:
+- **OpenAI Whisper Large V3** (MIT License)
+- **bofenghuang/whisper-large-v3-french** (French fine-tuning)
+We acknowledge the contributions of the open-source community and the original Whisper research team.
 ---
+## Contact
 For research collaboration, evaluation access, or technical inquiries:
+- **Website:** [https://gilbert-assistant.fr](https://gilbert-assistant.fr)
+- **Email:** mathis@lexiapro.fr
+- **Repository:** [https://huggingface.co/MEscriva/gilbert-fr-source](https://huggingface.co/MEscriva/gilbert-fr-source)
+---
+## Changelog
+### Version 0.1 (2024-12-19)
+- Initial research baseline release
+- Based on Whisper Large V3 with French optimization
+- Established as frozen reference point for Gilbert project
+- Documentation of baseline performance metrics
+---
+**© 2024 Lexia France. All rights reserved for Gilbert project derivatives.**