Upload 9 files
Browse files- CITATIONS.md +17 -0
- DEMO_SCRIPT_FR.md +24 -0
- PROMPTS.md +19 -0
- README.md +49 -12
- USER_GUIDE_FR.md +25 -0
- app.py +55 -0
- nlp_utils.py +100 -0
- requirements.txt +7 -0
- sample_transcript.txt +7 -0
CITATIONS.md
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CITATIONS
|
| 2 |
+
|
| 3 |
+
## Packages
|
| 4 |
+
- **gradio** (Apache 2.0)
|
| 5 |
+
- **transformers** (Apache 2.0) — Wolf et al. 2020
|
| 6 |
+
- **torch** (BSD-style)
|
| 7 |
+
- **sentencepiece** (Apache 2.0)
|
| 8 |
+
- **faster-whisper** (MIT) — implémentation rapide de Whisper
|
| 9 |
+
- **numpy**, **tqdm** (BSD/MIT)
|
| 10 |
+
|
| 11 |
+
## Modèles (Hugging Face)
|
| 12 |
+
- `facebook/bart-large-cnn` — Résumé (MIT)
|
| 13 |
+
- `google/flan-t5-large` — Génération/Extraction (Apache 2.0)
|
| 14 |
+
- `Systran/faster-whisper-small` (par défaut dans le code) — Transcription (MIT)
|
| 15 |
+
|
| 16 |
+
## Données
|
| 17 |
+
- `data/sample_transcript.txt` — exemple synthétique pour tests.
|
DEMO_SCRIPT_FR.md
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Script de démo (≤ 5 min) — MeetingNotes AI
|
| 2 |
+
|
| 3 |
+
0:00–0:20 — Contexte
|
| 4 |
+
- Trop de réunions, pas assez de temps → MeetingNotes AI
|
| 5 |
+
|
| 6 |
+
0:20–1:20 — Démo live
|
| 7 |
+
- Collez `data/sample_transcript.txt` ou uploadez un petit .mp3
|
| 8 |
+
- Cliquez **Analyser**
|
| 9 |
+
- Montrez Résumé + Actions + Décisions
|
| 10 |
+
|
| 11 |
+
1:20–2:20 — minutes.md
|
| 12 |
+
- Téléchargez / ouvrez le fichier généré
|
| 13 |
+
- Montrez la structure: Titre, Résumé, Actions, Décisions
|
| 14 |
+
|
| 15 |
+
2:20–3:30 — Comment ça marche
|
| 16 |
+
- Transcription: faster-whisper (small)
|
| 17 |
+
- Résumé: BART CNN
|
| 18 |
+
- Extraction: Flan-T5 (JSON strict)
|
| 19 |
+
|
| 20 |
+
3:30–4:30 — Impact
|
| 21 |
+
- Gain de temps, clarté des responsabilités, meilleur suivi
|
| 22 |
+
|
| 23 |
+
4:30–5:00 — CTA
|
| 24 |
+
- Disponible en Space Hugging Face / local
|
PROMPTS.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PROMPTS — MeetingNotes AI
|
| 2 |
+
|
| 3 |
+
## Résumé (BART via `pipeline("summarization")`)
|
| 4 |
+
- Pas de prompt custom (pipeline par défaut).
|
| 5 |
+
|
| 6 |
+
## Actions & Décisions (Flan-T5)
|
| 7 |
+
Template utilisé dans `nlp_utils.py` :
|
| 8 |
+
```
|
| 9 |
+
Tu es un assistant de prise de notes de réunion.
|
| 10 |
+
À partir du transcript ci-dessous, extrais :
|
| 11 |
+
1) une liste concise de "Points d'action" (qui fait quoi, verbe à l'infinitif, deadline si mentionnée)
|
| 12 |
+
2) une liste "Décisions prises" (phrases courtes)
|
| 13 |
+
|
| 14 |
+
Retourne du JSON strict de la forme :
|
| 15 |
+
{"actions": ["...","..."], "decisions": ["...","..."]}
|
| 16 |
+
|
| 17 |
+
Transcript:
|
| 18 |
+
{TRANSCRIPT}
|
| 19 |
+
```
|
README.md
CHANGED
|
@@ -1,12 +1,49 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MeetingNotes AI — Résumeur de réunions
|
| 2 |
+
|
| 3 |
+
**But** : Uploader un **audio** (ou coller un **transcript**) et obtenir :
|
| 4 |
+
- ✅ Un **résumé clair**
|
| 5 |
+
- 🧱 des **points d'action** (Action items)
|
| 6 |
+
- 🧩 les **décisions prises**
|
| 7 |
+
- 🗂️ un fichier **minutes.md** à partager
|
| 8 |
+
|
| 9 |
+
**Tech**
|
| 10 |
+
- Transcription : `faster-whisper` (implémentation rapide de Whisper, CPU/GPU)
|
| 11 |
+
- Résumé : `facebook/bart-large-cnn`
|
| 12 |
+
- Extraction actions/décisions : `google/flan-t5-large`
|
| 13 |
+
- Interface : **Gradio**
|
| 14 |
+
|
| 15 |
+
## Installation (local)
|
| 16 |
+
|
| 17 |
+
```bash
|
| 18 |
+
python -m venv .venv
|
| 19 |
+
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
| 20 |
+
pip install -r requirements.txt
|
| 21 |
+
# (Optionnel) installez ffmpeg si besoin d'audio :
|
| 22 |
+
# macOS: brew install ffmpeg
|
| 23 |
+
# Ubuntu/Debian: sudo apt-get install -y ffmpeg
|
| 24 |
+
python app.py
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
## Déploiement Hugging Face Spaces (recommandé)
|
| 28 |
+
1. Créez un Space (SDK: **Gradio**, visibilité: Public).
|
| 29 |
+
2. Uploadez **tous les fichiers** de ce dossier.
|
| 30 |
+
3. Attendez la fin du build (il lit `requirements.txt`).
|
| 31 |
+
4. Testez: chargez un `.mp3/.wav` ou collez un transcript.
|
| 32 |
+
|
| 33 |
+
## Structure
|
| 34 |
+
```
|
| 35 |
+
MeetingNotes_AI/
|
| 36 |
+
├─ app.py # UI Gradio
|
| 37 |
+
├─ nlp_utils.py # Transcription + résumé + extraction actions/décisions
|
| 38 |
+
├─ requirements.txt
|
| 39 |
+
├─ PROMPTS.md # Prompts et log d'utilisation d'outils
|
| 40 |
+
├─ CITATIONS.md # Paquets et modèles utilisés
|
| 41 |
+
├─ USER_GUIDE_FR.md # Guide utilisateur détaillé (FR)
|
| 42 |
+
├─ DEMO_SCRIPT_FR.md # Script vidéo ≤ 5 min (FR)
|
| 43 |
+
├─ data/
|
| 44 |
+
│ └─ sample_transcript.txt
|
| 45 |
+
└─ outputs/ # minutes.md généré
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Licence
|
| 49 |
+
MIT — 2025
|
USER_GUIDE_FR.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Guide Utilisateur (FR) — MeetingNotes AI
|
| 2 |
+
|
| 3 |
+
## Lancer l'app
|
| 4 |
+
- Local: voir README (venv → pip install → `python app.py`)
|
| 5 |
+
- Hugging Face Spaces: uploader les fichiers et ouvrir le Space
|
| 6 |
+
|
| 7 |
+
## Utilisation
|
| 8 |
+
1. **Choisir l'entrée** :
|
| 9 |
+
- Uploader un **fichier audio** (.mp3/.wav) → clique **Analyser** pour transcrire.
|
| 10 |
+
- OU **coller un transcript** dans la zone de texte.
|
| 11 |
+
|
| 12 |
+
2. **Sorties** :
|
| 13 |
+
- **Résumé** (1–2 paragraphes)
|
| 14 |
+
- **Points d'action** (liste)
|
| 15 |
+
- **Décisions prises** (liste)
|
| 16 |
+
- **minutes.md** téléchargeable dans la zone **Fichiers**
|
| 17 |
+
|
| 18 |
+
3. **Bonnes pratiques** :
|
| 19 |
+
- Pour l'audio, privilégier un enregistrement clair.
|
| 20 |
+
- Si plusieurs locuteurs, la transcription reste utile mais n'est pas diarizée par défaut.
|
| 21 |
+
- Vous pouvez corriger le transcript puis relancer l'extraction.
|
| 22 |
+
|
| 23 |
+
## Dépannage
|
| 24 |
+
- Si l'audio ne fonctionne pas: installez **ffmpeg**.
|
| 25 |
+
- Si c'est lent: utilisez un modèle Whisper plus petit (tiny/base) ou `flan-t5-base`.
|
app.py
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr, os, json
|
| 2 |
+
from nlp_utils import transcribe_audio, summarize, extract_actions_decisions, make_minutes_md
|
| 3 |
+
|
| 4 |
+
OUT_DIR = "outputs"
|
| 5 |
+
os.makedirs(OUT_DIR, exist_ok=True)
|
| 6 |
+
|
| 7 |
+
def process(audio_file, transcript_text, meeting_title):
|
| 8 |
+
text = ""
|
| 9 |
+
if audio_file is not None:
|
| 10 |
+
text = transcribe_audio(audio_file)
|
| 11 |
+
if transcript_text and transcript_text.strip():
|
| 12 |
+
extra = transcript_text.strip()
|
| 13 |
+
text = (text + "\n" + extra).strip() if text else extra
|
| 14 |
+
|
| 15 |
+
if not text or len(text) < 40:
|
| 16 |
+
return "Merci d'uploader un audio OU de coller un transcript (≥ 40 caractères).", "", [], [], None
|
| 17 |
+
|
| 18 |
+
resum = summarize(text)
|
| 19 |
+
ed = extract_actions_decisions(text)
|
| 20 |
+
actions = ed.get("actions", [])
|
| 21 |
+
decisions = ed.get("decisions", [])
|
| 22 |
+
|
| 23 |
+
title = meeting_title or "Réunion"
|
| 24 |
+
md = make_minutes_md(title, resum, actions, decisions)
|
| 25 |
+
md_path = os.path.join(OUT_DIR, "minutes.md")
|
| 26 |
+
with open(md_path, "w", encoding="utf-8") as f:
|
| 27 |
+
f.write(md)
|
| 28 |
+
|
| 29 |
+
# HighlightedText expects list of (text, label). We'll tag each item.
|
| 30 |
+
actions_ht = [(a, "Action") for a in actions] if actions else []
|
| 31 |
+
decisions_ht = [(d, "Décision") for d in decisions] if decisions else []
|
| 32 |
+
|
| 33 |
+
return "Analyse terminée ✅", resum, actions_ht, decisions_ht, md_path
|
| 34 |
+
|
| 35 |
+
with gr.Blocks(title="MeetingNotes AI — Résumeur de réunions") as demo:
|
| 36 |
+
gr.Markdown("# MeetingNotes AI — Résumeur de réunions")
|
| 37 |
+
gr.Markdown("Chargez un **audio** ou **collez un transcript**, puis cliquez **Analyser**.")
|
| 38 |
+
|
| 39 |
+
with gr.Row():
|
| 40 |
+
with gr.Column():
|
| 41 |
+
meeting_title = gr.Textbox(label="Titre de la réunion", value="Lancement produit — Weekly")
|
| 42 |
+
audio = gr.Audio(label="Audio (mp3/wav)", sources=["upload"], type="filepath")
|
| 43 |
+
transcript = gr.Textbox(label="Transcript (optionnel si audio)", lines=10, placeholder="Collez ici…")
|
| 44 |
+
btn = gr.Button("Analyser")
|
| 45 |
+
with gr.Column():
|
| 46 |
+
status = gr.Textbox(label="Statut")
|
| 47 |
+
resume = gr.Textbox(label="Résumé", lines=8)
|
| 48 |
+
actions = gr.HighlightedText(label="Points d'action", combine_adjacent=True)
|
| 49 |
+
decisions = gr.HighlightedText(label="Décisions prises", combine_adjacent=True)
|
| 50 |
+
files = gr.File(label="Télécharger minutes.md")
|
| 51 |
+
|
| 52 |
+
btn.click(process, inputs=[audio, transcript, meeting_title], outputs=[status, resume, actions, decisions, files])
|
| 53 |
+
|
| 54 |
+
if __name__ == "__main__":
|
| 55 |
+
demo.launch()
|
nlp_utils.py
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os, json, re, datetime
|
| 2 |
+
from typing import Dict, List, Optional
|
| 3 |
+
from transformers import pipeline
|
| 4 |
+
from faster_whisper import WhisperModel
|
| 5 |
+
|
| 6 |
+
# --------- Models (lazy init) ---------
|
| 7 |
+
_SUMMARIZER = None
|
| 8 |
+
_QA = None
|
| 9 |
+
_WHISPER = None
|
| 10 |
+
|
| 11 |
+
def get_summarizer():
|
| 12 |
+
global _SUMMARIZER
|
| 13 |
+
if _SUMMARIZER is None:
|
| 14 |
+
_SUMMARIZER = pipeline("summarization", model="facebook/bart-large-cnn")
|
| 15 |
+
return _SUMMARIZER
|
| 16 |
+
|
| 17 |
+
def get_extractor():
|
| 18 |
+
"""Use Flan-T5 for JSON-style action/decision extraction via text2text pipeline."""
|
| 19 |
+
global _QA
|
| 20 |
+
if _QA is None:
|
| 21 |
+
_QA = pipeline("text2text-generation", model="google/flan-t5-large", max_new_tokens=256)
|
| 22 |
+
return _QA
|
| 23 |
+
|
| 24 |
+
def get_whisper(device: str = "auto"):
|
| 25 |
+
global _WHISPER
|
| 26 |
+
if _WHISPER is None:
|
| 27 |
+
_WHISPER = WhisperModel("Systran/faster-whisper-small", device=device, compute_type="int8")
|
| 28 |
+
return _WHISPER
|
| 29 |
+
|
| 30 |
+
# --------- Core functions ---------
|
| 31 |
+
def transcribe_audio(audio_path: str) -> str:
|
| 32 |
+
model = get_whisper()
|
| 33 |
+
segments, info = model.transcribe(audio_path, beam_size=1)
|
| 34 |
+
text = " ".join(seg.text.strip() for seg in segments)
|
| 35 |
+
return text.strip()
|
| 36 |
+
|
| 37 |
+
def summarize(text: str) -> str:
|
| 38 |
+
summarizer = get_summarizer()
|
| 39 |
+
chunks = _chunk(text, 2200)
|
| 40 |
+
partials = [summarizer(ch, do_sample=False)[0]["summary_text"] for ch in chunks]
|
| 41 |
+
merged = " ".join(partials)
|
| 42 |
+
final = summarizer(merged, do_sample=False, max_length=200, min_length=60)[0]["summary_text"]
|
| 43 |
+
return final
|
| 44 |
+
|
| 45 |
+
def extract_actions_decisions(text: str) -> Dict[str, List[str]]:
|
| 46 |
+
prompt = f"""Tu es un assistant de prise de notes de réunion.
|
| 47 |
+
À partir du transcript ci-dessous, extrais :
|
| 48 |
+
1) une liste concise de "Points d'action" (qui fait quoi, verbe à l'infinitif, deadline si mentionnée)
|
| 49 |
+
2) une liste "Décisions prises" (phrases courtes)
|
| 50 |
+
|
| 51 |
+
Retourne du JSON strict de la forme :
|
| 52 |
+
{{"actions": ["...","..."], "decisions": ["...","..."]}}
|
| 53 |
+
|
| 54 |
+
Transcript:
|
| 55 |
+
{text[:7000]}
|
| 56 |
+
"""
|
| 57 |
+
gen = get_extractor()
|
| 58 |
+
out = gen(prompt)[0]["generated_text"]
|
| 59 |
+
try:
|
| 60 |
+
data = json.loads(out)
|
| 61 |
+
actions = [s.strip() for s in data.get("actions", []) if s.strip()]
|
| 62 |
+
decisions = [s.strip() for s in data.get("decisions", []) if s.strip()]
|
| 63 |
+
return {"actions": actions, "decisions": decisions}
|
| 64 |
+
except Exception:
|
| 65 |
+
actions = []
|
| 66 |
+
decisions = []
|
| 67 |
+
for line in text.splitlines():
|
| 68 |
+
if re.search(r"(?i)\b(action|à faire|todo|faire):", line):
|
| 69 |
+
actions.append(re.sub(r"(?i)^.*?:\s*", "", line).strip())
|
| 70 |
+
if re.search(r"(?i)\b(décision|decision):", line):
|
| 71 |
+
decisions.append(re.sub(r"(?i)^.*?:\s*", "", line).strip())
|
| 72 |
+
return {"actions": actions, "decisions": decisions}
|
| 73 |
+
|
| 74 |
+
def make_minutes_md(title: str, summary: str, actions: List[str], decisions: List[str]) -> str:
|
| 75 |
+
now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
|
| 76 |
+
lines = [
|
| 77 |
+
f"# {title} — Compte-rendu",
|
| 78 |
+
f"_Généré le {now}_",
|
| 79 |
+
"",
|
| 80 |
+
"## Résumé",
|
| 81 |
+
summary.strip() if summary else "—",
|
| 82 |
+
"",
|
| 83 |
+
"## Points d'action",
|
| 84 |
+
*[f"- [ ] {a}" for a in (actions or ["—"])],
|
| 85 |
+
"",
|
| 86 |
+
"## Décisions prises",
|
| 87 |
+
*[f"- {d}" for d in (decisions or ["—"])],
|
| 88 |
+
"",
|
| 89 |
+
]
|
| 90 |
+
return "\n".join(lines)
|
| 91 |
+
|
| 92 |
+
def _chunk(text: str, max_chars: int) -> List[str]:
|
| 93 |
+
parts, buf, size = [], [], 0
|
| 94 |
+
import re as _re
|
| 95 |
+
for sent in _re.split(r'(?<=[\.!\?])\s+', text):
|
| 96 |
+
if size + len(sent) > max_chars and buf:
|
| 97 |
+
parts.append(" ".join(buf)); buf, size = [], 0
|
| 98 |
+
buf.append(sent); size += len(sent) + 1
|
| 99 |
+
if buf: parts.append(" ".join(buf))
|
| 100 |
+
return parts
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=4.44.0
|
| 2 |
+
transformers>=4.44.0
|
| 3 |
+
torch>=2.2.0
|
| 4 |
+
sentencepiece>=0.1.99
|
| 5 |
+
faster-whisper>=1.0.0
|
| 6 |
+
numpy>=1.26.4
|
| 7 |
+
tqdm>=4.66.4
|
sample_transcript.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[00:00] Alice: Bienvenue à tous. Objectif: finaliser le plan de lancement.
|
| 2 |
+
[00:15] Bob: Il nous manque encore les visuels pour la campagne.
|
| 3 |
+
[00:30] Chloé: L'équipe design promet une première version mercredi.
|
| 4 |
+
[00:45] Alice: OK. Décision: on garde le budget à 20k€.
|
| 5 |
+
[01:00] Bob: Action: je contacte l'agence média aujourd'hui.
|
| 6 |
+
[01:15] Chloé: Action: je prépare un check-list pour la page produit.
|
| 7 |
+
[01:30] Alice: Prochaine réunion vendredi 10h. Fin.
|