MEscriva commited on
Commit
17e196e
Β·
verified Β·
1 Parent(s): 5d5157b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -7,4 +7,95 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ <p align="center">
11
+ <img src="https://raw.githubusercontent.com/mathisescriva/gilbert_landing_page/main/assets/images/img/logo_gilbert.png" width="180" />
12
+ </p>
13
+
14
+ # Gilbert AI
15
+ **Sovereign Speech Intelligence Research Lab**
16
+
17
+ ---
18
+
19
+ ## 1. Mission
20
+
21
+ Gilbert AI develops sovereign, high-accuracy automatic speech recognition (ASR) systems designed for institutional, educational, and enterprise environments.
22
+
23
+ The core objectives include:
24
+
25
+ - achieving state-of-the-art ASR performance for **French**, including long-form, spontaneous, multi-speaker, and accented speech;
26
+ - ensuring robustness across heterogeneous conditions (telephone bandwidth, online conferencing, real-world noise);
27
+ - maintaining strong multilingual capabilities while specializing in French;
28
+ - delivering reproducible and scientifically-rigorous evaluation frameworks;
29
+ - enabling deployment on sovereign infrastructures with strict data governance constraints.
30
+
31
+ ---
32
+
33
+ ## 2. Research Themes
34
+
35
+ ### **2.1 High-precision French ASR**
36
+ Tailored to:
37
+
38
+ - meetings and long-form speech
39
+ - lectures and institutional discourse
40
+ - spontaneous multi-speaker environments
41
+ - regional and international accents
42
+
43
+ ### **2.2 Multilingual preservation**
44
+ Although optimized for French, models are evaluated to maintain strong cross-lingual performance.
45
+
46
+ ### **2.3 Robustness, frugality, and domain adaptation**
47
+ Research includes:
48
+
49
+ - noise augmentation and channel variability
50
+ - 8 kHz telephony adaptation
51
+ - low-compute inference
52
+ - distillation for edge deployment
53
+
54
+ ### **2.4 Benchmarking and dataset engineering**
55
+ Gilbert designs domain-specific datasets and benchmark suites for evaluating ASR in conditions representative of operational environments.
56
+
57
+ ---
58
+
59
+ ## 3. Current Model Family
60
+
61
+ ### **Gilbert-FR-Source (2025)**
62
+ Baseline model fine-tuned on curated high-quality French corpora; foundation for all domain-specific variants.
63
+
64
+ ### Upcoming research releases:
65
+
66
+ - **Gilbert-FR-Longform-v1** β€” extended-speech and meeting optimization
67
+ - **Gilbert-FR-Accents-v1** β€” regional and international accent specialization
68
+ - **Gilbert-FR-Telephone-v1** β€” 8 kHz telephony channel adaptation
69
+ - **Gilbert-Edu-ASR-v1** β€” education-specific speech model
70
+ - **Gilbert-Multilingual-v1** β€” French-centric multilingual enhancement
71
+
72
+ ---
73
+
74
+ ## 4. Baseline Performance (WER)
75
+
76
+ | Dataset | WER |
77
+ |--------|------|
78
+ | MLS (FR) | **3.98%** |
79
+ | Common Voice v13 (FR) | **7.28%** |
80
+ | VoxPopuli (FR) | **8.91%** |
81
+ | Fleurs (FR) | **4.84%** |
82
+ | African-accent French | **4.20%** |
83
+
84
+ These results position Gilbert-FR among the strongest open-source ASR models available for the French language.
85
+
86
+ ---
87
+
88
+ ## 5. Research Principles
89
+
90
+ 1. **Reproducibility**
91
+ 2. **Scientific rigor**
92
+ 3. **Transparent evaluation**
93
+ 4. **Sovereign infrastructure**
94
+ 5. **Operational relevance**
95
+
96
+ ---
97
+
98
+ ## 6. Contact
99
+
100
+ - Website: **https://gilbert-assistant.fr**
101
+ - Contact: **mathis@lexiapro.fr**