Spaces:

matis35
/

feedbacks-scoring

Sleeping

App Files Files Community

Matis Codjia commited on Nov 27, 2025

Commit

1d8c2e0

1 Parent(s): e4d3f10

Scoring app

Browse files

Files changed (19) hide show

CHANGELOG.md +185 -0
CONFIGURATION.md +343 -0
Dockerfile +20 -6
QUICKSTART.md +265 -0
README.md +139 -10
app.py +415 -0
backend/__init__.py +3 -0
backend/annotator_config.py +279 -0
backend/auth.py +148 -0
backend/data_loader.py +31 -0
backend/export.py +65 -0
backend/hf_storage.py +237 -0
backend/persistence.py +157 -0
backend/statistics.py +53 -0
frontend/__init__.py +3 -0
frontend/components.py +189 -0
frontend/help_page.py +308 -0
frontend/styles.py +132 -0
requirements.txt +5 -3

CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,185 @@

+# Changelog - Version Production
+## Version 2.0.0 - Production Ready (2025-01-27)
+### 🎉 Nouvelles fonctionnalités majeures
+#### 🔐 Sécurité et Authentification
+- **Authentification par mot de passe** : Accès sécurisé via `APP_PASSWORD`
+- **Système de dispatch multi-annotateurs** : Chaque annotateur ne voit que sa portion
+- **Isolation des données** : Traçabilité complète par annotateur
+- Module : `backend/auth.py`
+#### 👥 Gestion des Annotateurs
+- **Configuration flexible** : Fichier JSON ou secrets HF
+- **Assignment automatique** : Filtrage par `start_idx` / `end_idx`
+- **Interface de sélection** : Choix visuel de l'identifiant
+- **Validation** : Vérification de la cohérence de la config
+- Module : `backend/annotator_config.py`
+#### ☁️ Persistance Cloud
+- **Sauvegarde HuggingFace** : Stockage permanent sur dataset privé
+- **Auto-restore** : Chargement automatique au démarrage
+- **Format horodaté** : Fichiers JSON avec timestamp
+- **Bouton explicite** : Sauvegarde manuelle à tout moment
+- Module : `backend/hf_storage.py`
+### 🔧 Modifications
+#### Interface Utilisateur
+- Affichage de l'identité annotateur dans la sidebar
+- Indicateur de portion assignée (start-end)
+- Badge de statut sauvegarde cloud
+- Bouton "☁️ Sauvegarder sur HF" proéminent
+- Nom de fichier export personnalisé par annotateur
+#### Backend
+- Filtrage automatique du dataset au chargement
+- Stockage du dataset complet + portion filtrée
+- Métadonnées enrichies (annotator_id, filtered, etc.)
+- Support secrets HF + variables d'environnement
+#### Configuration
+- Nouveau fichier `data/annotators.json`
+- Exemple fourni : `data/annotators.json.example`
+- Secrets HF : `APP_PASSWORD`, `HF_TOKEN`, `HF_DATASET_REPO`
+- Fallback sur config par défaut si non configuré
+### 📚 Documentation
+#### Nouveaux fichiers
+- **QUICKSTART.md** : Guide de démarrage rapide (15 min)
+- **CONFIGURATION.md** : Documentation complète de configuration
+- **CHANGELOG.md** : Ce fichier
+#### Mises à jour
+- **README.md** : Vue d'ensemble production, architecture, sécurité
+- **requirements.txt** : Ajout de `huggingface_hub>=0.20.0`
+- **.gitignore** : Ignorer secrets et données sensibles
+### 🏗️ Architecture
+#### Nouveaux modules
+```
+backend/
+├── auth.py              # Authentification
+├── annotator_config.py  # Configuration annotateurs
+└── hf_storage.py        # Sauvegarde HuggingFace
+data/
+└── annotators.json.example  # Template configuration
+```
+#### Flux d'exécution
+1. **Authentification** : Vérification mot de passe
+2. **Sélection** : Choix de l'annotateur
+3. **Filtrage** : Dataset filtré automatiquement
+4. **Annotation** : Interface scoring
+5. **Persistance** : Sauvegarde locale + HF cloud
+### 🔒 Sécurité
+#### Protections ajoutées
+- ✅ Mot de passe obligatoire pour accès
+- ✅ Tokens jamais en clair dans le code
+- ✅ Dataset de stockage privé recommandé
+- ✅ Isolation des annotateurs
+- ✅ Traçabilité complète (qui a annoté quoi)
+#### Bonnes pratiques
+- Secrets via HF Spaces (pas de commit)
+- Validation des configurations
+- Messages d'erreur explicites
+- Mode développement avec fallbacks
+### 📊 Intégration FFGen
+#### Workflow complet
+1. **FFGen** : `create_annotation_study.py` → Créer subsets + gold
+2. **App** : Distribuer et collecter annotations
+3. **FFGen** : `analyze_agreement.py` → Calculer accord
+4. **FFGen** : `merge_scores.py` → Dataset final
+#### Compatibilité
+- Format identique aux scripts FFGen
+- Support des gold items pour IAA
+- Export JSONL directement utilisable
+- Métadonnées compatibles
+### 🐛 Corrections
+- Fix : Sauvegarde perdue après redémarrage → Sauvegarde HF
+- Fix : Tous les annotateurs voient tout → Filtrage par portion
+- Fix : Pas de traçabilité → annotator_id dans toutes les sauvegardes
+- Fix : Accès public non sécurisé → Authentification obligatoire
+### ⚠️ Breaking Changes
+#### Configuration requise
+- **Avant** : Aucune config nécessaire
+- **Après** : Secrets HF obligatoires en production
+#### Workflow
+- **Avant** : Un seul utilisateur, dataset complet
+- **Après** : Multi-utilisateurs, portions filtrées
+#### Migration depuis v1.0
+1. Configurer les secrets HF (voir QUICKSTART.md)
+2. Créer `data/annotators.json`
+3. Pousser la nouvelle version
+4. Informer les annotateurs du nouveau workflow
+### 📈 Métriques
+#### Fichiers modifiés
+- 4 fichiers modifiés
+- 3 nouveaux modules backend
+- 3 nouveaux fichiers documentation
+- 1 fichier exemple configuration
+#### Lignes de code
+- ~600 lignes ajoutées (backend)
+- ~200 lignes ajoutées (app.py)
+- ~800 lignes documentation
+### 🚀 Déploiement
+#### Prérequis
+1. Créer dataset HF privé pour stockage
+2. Créer token HF avec droits write
+3. Configurer secrets dans HF Space
+4. Créer configuration annotateurs
+#### Commandes
+```bash
+git add .
+git commit -m "Add authentication, dispatch and HF persistence"
+git push origin main
+```
+Le Space rebuild automatiquement (3-5 min).
+### 📖 Ressources
+- **Guide rapide** : [QUICKSTART.md](QUICKSTART.md) - 15 minutes
+- **Documentation complète** : [CONFIGURATION.md](CONFIGURATION.md)
+- **Exemple config** : `data/annotators.json.example`
+### 🙏 Remerciements
+Cette version implémente les recommandations de sécurité et persistance pour un déploiement en production sur HuggingFace Spaces avec hébergement gratuit.
+Inspiré par :
+- Le système de dispatch de FFGen (`create_annotation_study.py`)
+- Les bonnes pratiques HuggingFace Spaces
+- Les besoins réels d'études d'annotation multi-annotateurs
+---
+## Versions précédentes
+### Version 1.0.0 - Version initiale
+- Interface de scoring basique
+- Sauvegarde locale uniquement
+- Un seul utilisateur
+- Pas d'authentification

CONFIGURATION.md ADDED Viewed

	@@ -0,0 +1,343 @@

+# Configuration de l'Application d'Annotation
+Cette documentation explique comment configurer l'application pour un déploiement en production sur Hugging Face Spaces avec plusieurs annotateurs.
+## Table des matières
+1. [Vue d'ensemble](#vue-densemble)
+2. [Étape 1 : Créer le Dataset de stockage](#étape-1--créer-le-dataset-de-stockage)
+3. [Étape 2 : Configurer les Secrets HF](#étape-2--configurer-les-secrets-hf)
+4. [Étape 3 : Configurer les Annotateurs](#étape-3--configurer-les-annotateurs)
+5. [Étape 4 : Pousser sur HF Spaces](#étape-4--pousser-sur-hf-spaces)
+6. [Utilisation pour les Annotateurs](#utilisation-pour-les-annotateurs)
+7. [Récupération des Annotations](#récupération-des-annotations)
+---
+## Vue d'ensemble
+L'application implémente 3 systèmes de sécurité et persistance :
+1. **Authentification** : Mot de passe unique pour accéder à l'app
+2. **Dispatch** : Chaque annotateur voit uniquement sa portion du dataset
+3. **Persistance** : Sauvegarde automatique sur un Dataset HuggingFace privé
+---
+## Étape 1 : Créer le Dataset de stockage
+1. Allez sur https://huggingface.co/new-dataset
+2. Créez un nouveau dataset **PRIVÉ** :
+   - Nom : `ffgen-annotations-storage` (ou autre)
+   - Type : Dataset
+   - Visibilité : **Private** ⚠️ Très important !
+3. Laissez-le vide, l'app créera automatiquement la structure
+---
+## Étape 2 : Configurer les Secrets HF
+### 2.1 Créer un Token HuggingFace
+1. Allez dans vos settings HF : https://huggingface.co/settings/tokens
+2. Créez un nouveau token avec les permissions **write**
+3. Copiez le token (format : `hf_xxxxxxxxxxxx`)
+### 2.2 Ajouter les Secrets dans le Space
+1. Allez dans votre Space : `https://huggingface.co/spaces/VOTRE-USERNAME/feedbacks-scoring-app`
+2. Cliquez sur **Settings**
+3. Scrollez vers **Variables and secrets**
+4. Ajoutez les secrets suivants :
+| Nom | Valeur | Description |
+|-----|--------|-------------|
+| `APP_PASSWORD` | `VotreMotDePasse2025!` | Mot de passe pour accéder à l'app |
+| `HF_TOKEN` | `hf_xxxxxxxxxxxx` | Token avec droits d'écriture |
+| `HF_DATASET_REPO` | `VOTRE-USERNAME/ffgen-annotations-storage` | Nom du dataset de stockage |
+**Important** :
+- Le token doit avoir les droits d'écriture
+- Le dataset doit être privé pour protéger les annotations
+---
+## Étape 3 : Configurer les Annotateurs
+### Option A : Configuration automatique (recommandé)
+Créez un fichier `annotators.json` dans le dossier `data/` :
+```json
+{
+  "annotator_1": {
+    "name": "Alice Dupont",
+    "start_idx": 0,
+    "end_idx": 100,
+    "description": "Première portion du dataset"
+  },
+  "annotator_2": {
+    "name": "Bob Martin",
+    "start_idx": 100,
+    "end_idx": 200,
+    "description": "Deuxième portion"
+  },
+  "annotator_3": {
+    "name": "Charlie Durand",
+    "start_idx": 200,
+    "end_idx": 300,
+    "description": "Troisième portion"
+  }
+}
+```
+### Option B : Configuration via Secret HF
+Ajoutez un secret `ANNOTATOR_CONFIG` avec la configuration JSON en une ligne :
+```json
+{"annotator_1":{"name":"Alice","start_idx":0,"end_idx":100},"annotator_2":{"name":"Bob","start_idx":100,"end_idx":200}}
+```
+### Utilisation du script de génération FFGen
+Si vous avez déjà utilisé le script `create_annotation_study.py` de FFGen :
+```bash
+# Depuis FFGen/3_data_processing/
+python create_annotation_study.py \
+    matis35/code-feedback-infonce \
+    --num-samples 300 \
+    --num-subsets 3 \
+    --output-dir annotation_study
+```
+Cela crée 3 subsets de ~100 items chacun + gold standard.
+Pour l'app, configurez les annotateurs correspondants :
+- annotator_1 : subset_01 (indices 0-100)
+- annotator_2 : subset_02 (indices 100-200)
+- annotator_3 : subset_03 (indices 200-300)
+---
+## Étape 4 : Pousser sur HF Spaces
+1. Commitez tous les changements :
+```bash
+git add .
+git commit -m "Add authentication, dispatch and HF persistence"
+```
+2. Poussez vers HF :
+```bash
+git push origin main
+```
+3. Le Space va rebuilder automatiquement (prend 3-5 minutes)
+4. Une fois prêt, vérifiez que l'app démarre correctement
+---
+## Utilisation pour les Annotateurs
+### Workflow complet
+1. **Accès à l'application**
+   - URL : `https://huggingface.co/spaces/VOTRE-USERNAME/feedbacks-scoring-app`
+   - Entrer le mot de passe fourni
+2. **Sélection de l'identité**
+   - Choisir son identifiant dans la liste
+   - Le système affiche automatiquement la portion assignée
+3. **Chargement du dataset**
+   - Option 1 : Upload d'un fichier JSONL
+   - Option 2 : Chargement depuis HF Hub
+   - Le dataset est automatiquement filtré selon la portion
+4. **Annotation**
+   - Scorer les feedbacks (1-5)
+   - Ajouter des commentaires optionnels
+   - La progression est sauvegardée automatiquement
+5. **Sauvegarde**
+   - **Importante** : Cliquer sur "☁️ Sauvegarder sur HF" régulièrement
+   - Cette sauvegarde est permanente (survit aux redémarrages)
+   - La sauvegarde locale est perdue tous les 48h
+6. **Export (optionnel)**
+   - Télécharger le JSONL en local pour backup
+### Instructions à donner aux annotateurs
+> **Bienvenue sur l'outil d'annotation FFGen !**
+>
+> 1. Mot de passe : `[VOTRE_MOT_DE_PASSE]`
+> 2. Choisissez votre identifiant dans la liste
+> 3. Chargez le dataset (demandez le lien HF ou le fichier)
+> 4. Annotez les feedbacks selon les critères fournis
+> 5. **Important** : Sauvegardez sur HF toutes les 30-60 minutes
+> 6. Vous pouvez fermer et reprendre plus tard
+>
+> Questions ? Contactez [VOTRE_EMAIL]
+---
+## Récupération des Annotations
+### Méthode 1 : Via HuggingFace Hub (recommandé)
+1. Allez sur votre dataset de stockage :
+   `https://huggingface.co/datasets/VOTRE-USERNAME/ffgen-annotations-storage`
+2. Dans le dossier `annotations/`, vous trouverez tous les fichiers :
+   - `annotation_annotator_1_20250127_143022.json`
+   - `annotation_annotator_2_20250127_151533.json`
+   - etc.
+3. Téléchargez-les ou utilisez la CLI :
+```bash
+huggingface-cli download \
+    VOTRE-USERNAME/ffgen-annotations-storage \
+    --repo-type dataset \
+    --local-dir ./annotations
+```
+### Méthode 2 : Via l'API Python
+```python
+from huggingface_hub import HfApi, hf_hub_download
+import json
+api = HfApi(token="hf_xxxxxxxxxxxx")
+# Lister tous les fichiers
+files = api.list_repo_files(
+    repo_id="VOTRE-USERNAME/ffgen-annotations-storage",
+    repo_type="dataset"
+)
+# Télécharger et charger chaque annotation
+for file in files:
+    if file.startswith("annotations/"):
+        local_path = hf_hub_download(
+            repo_id="VOTRE-USERNAME/ffgen-annotations-storage",
+            filename=file,
+            repo_type="dataset",
+            token="hf_xxxxxxxxxxxx"
+        )
+        with open(local_path, 'r') as f:
+            data = json.load(f)
+            print(f"Annotator: {data['annotator_name']}")
+            print(f"Scores: {len(data['scores'])}")
+```
+### Méthode 3 : Utiliser le script de merge FFGen
+Si vous avez des gold items pour calculer l'accord inter-annotateurs :
+```bash
+cd FFGen/3_data_processing
+# Analyser l'accord
+python analyze_agreement.py \
+    ../../annotations/annotation_*.json \
+    --gold-file annotation_study/matis35_code-feedback-infonce_gold_standard.json
+# Merger les scores
+python merge_scores.py \
+    ../../annotations/annotation_*.json \
+    -o final_annotations.jsonl
+```
+---
+## Dépannage
+### Le Space ne démarre pas
+- Vérifiez que tous les secrets sont bien configurés
+- Regardez les logs du Space (onglet Logs)
+- Vérifiez que le `HF_TOKEN` a les droits d'écriture
+### L'authentification ne marche pas
+- Vérifiez que `APP_PASSWORD` est bien dans les secrets
+- Pas d'espaces avant/après le mot de passe
+### La sauvegarde HF échoue
+- Vérifiez que `HF_TOKEN` a les droits d'écriture
+- Vérifiez que `HF_DATASET_REPO` existe et est privé
+- Regardez les messages d'erreur dans l'app
+### Un annotateur voit le dataset complet
+- Vérifiez la configuration dans `annotators.json`
+- Vérifiez que les indices start_idx/end_idx sont corrects
+- Rechargez le dataset après modification de la config
+### Les annotations sont perdues
+- Rappelez aux annotateurs de sauvegarder sur HF
+- La sauvegarde locale est réinitialisée tous les 48h
+- Les sauvegardes HF sont permanentes
+---
+## Architecture Technique
+```
+┌─────────────────────────────────────────────────────────┐
+│                    HF Space (Public)                     │
+│  ┌────────────────────────────────────────────────────┐ │
+│  │  1. Authentification (APP_PASSWORD)                │ │
+│  │      ↓                                              │ │
+│  │  2. Sélection Annotateur                           │ │
+│  │      ↓                                              │ │
+│  │  3. Filtrage Dataset (start_idx, end_idx)          │ │
+│  │      ↓                                              │ │
+│  │  4. Interface d'Annotation                         │ │
+│  │      ↓                                              │ │
+│  │  5. Sauvegarde Cloud (HF_TOKEN)                    │ │
+│  └────────────────────────────────────────────────────┘ │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ↓
+┌─────────────────────────────────────────────────────────┐
+│      HF Dataset (Private): annotations-storage          │
+│  ┌────────────────────────────────────────────────────┐ │
+│  │  annotations/                                       │ │
+│  │    ├── annotation_annotator_1_timestamp.json       │ │
+│  │    ├── annotation_annotator_2_timestamp.json       │ │
+│  │    └── annotation_annotator_3_timestamp.json       │ │
+│  └────────────────────────────────────────────────────┘ │
+└─────────────────────────────────────────────────────────┘
+```
+---
+## Sécurité et Bonnes Pratiques
+✅ **À faire** :
+- Utiliser des mots de passe forts (12+ caractères)
+- Garder le dataset de stockage **PRIVÉ**
+- Révoquer les tokens après l'étude
+- Sauvegarder régulièrement sur HF
+- Tester avec un annotateur avant le déploiement complet
+❌ **À éviter** :
+- Partager le token HF publiquement
+- Utiliser le même mot de passe que vos autres comptes
+- Rendre le dataset de stockage public
+- Compter uniquement sur la sauvegarde locale
+- Modifier la config en cours d'annotation
+---
+## Support
+Pour toute question ou problème :
+- Issues GitHub : [LIEN_REPO]
+- Email : [VOTRE_EMAIL]
+- Documentation FFGen : `FFGen/README.md`

Dockerfile CHANGED Viewed

@@ -1,20 +1,34 @@
-FROM python:3.13.5-slim
 WORKDIR /app
 RUN apt-get update && apt-get install -y \
     build-essential \
     curl \
     git \
     && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt ./
-COPY src/ ./src/
-RUN pip3 install -r requirements.txt
-EXPOSE 8501
-HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
-ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]

+FROM python:3.11-slim
 WORKDIR /app
+# Install system dependencies
 RUN apt-get update && apt-get install -y \
     build-essential \
     curl \
     git \
     && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install dependencies
 COPY requirements.txt ./
+RUN pip3 install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY app.py ./
+COPY backend/ ./backend/
+COPY frontend/ ./frontend/
+# Create data directory
+RUN mkdir -p /app/data
+# Create .streamlit directory and empty secrets file to avoid warnings
+RUN mkdir -p /app/.streamlit && touch /app/.streamlit/secrets.toml
+# Expose Streamlit port
+EXPOSE 7860
+# Health check for Hugging Face Spaces
+HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
+# Run the application
+CMD ["streamlit", "run", "app.py", "--server.port=7860", "--server.address=0.0.0.0", "--server.enableCORS=false", "--server.enableXsrfProtection=false"]

QUICKSTART.md ADDED Viewed

	@@ -0,0 +1,265 @@

+# Guide de Démarrage Rapide
+Configuration en 5 minutes pour déployer l'application d'annotation sécurisée.
+## Prérequis
+- Un compte HuggingFace
+- Accès à votre Space `matis35/feedbacks-scoring-app`
+- Le dataset à annoter (format JSONL ou sur HF Hub)
+## Étape 1 : Créer le Dataset de Stockage (2 min)
+1. Allez sur https://huggingface.co/new-dataset
+2. Nom : `ffgen-annotations-storage`
+3. Visibilité : **Private** (très important!)
+4. Cliquez sur "Create dataset"
+5. Laissez-le vide, ne chargez rien
+✅ Vous avez maintenant : `matis35/ffgen-annotations-storage`
+## Étape 2 : Créer un Token HF (1 min)
+1. Allez sur https://huggingface.co/settings/tokens
+2. Cliquez "New token"
+3. Nom : `annotation-app-write`
+4. Type : **Write** (important!)
+5. Cliquez "Generate token"
+6. **Copiez le token** (format `hf_xxxxxxxxxxxx`)
+⚠️ Gardez ce token en sécurité, ne le partagez pas!
+## Étape 3 : Configurer les Secrets (2 min)
+1. Allez sur votre Space : https://huggingface.co/spaces/matis35/feedbacks-scoring-app
+2. Cliquez sur **Settings** (en haut)
+3. Scrollez vers "Variables and secrets"
+4. Ajoutez ces 3 secrets :
+### Secret 1 : APP_PASSWORD
+- Nom : `APP_PASSWORD`
+- Valeur : Choisissez un mot de passe fort (ex: `Annotator2025!`)
+- Type : Secret
+### Secret 2 : HF_TOKEN
+- Nom : `HF_TOKEN`
+- Valeur : Le token copié à l'étape 2 (`hf_xxxxx`)
+- Type : Secret
+### Secret 3 : HF_DATASET_REPO
+- Nom : `HF_DATASET_REPO`
+- Valeur : `matis35/ffgen-annotations-storage`
+- Type : Secret
+✅ Les 3 secrets doivent être visibles dans la liste
+## Étape 4 : Configurer les Annotateurs (2 min)
+### Option A : Configuration basique (3 annotateurs)
+Copiez le fichier exemple :
+```bash
+cd feedbacks-scoring-app
+cp data/annotators.json.example data/annotators.json
+```
+Éditez `data/annotators.json` selon votre dataset :
+```json
+{
+  "annotator_1": {
+    "name": "Alice",
+    "start_idx": 0,
+    "end_idx": 100
+  },
+  "annotator_2": {
+    "name": "Bob",
+    "start_idx": 100,
+    "end_idx": 200
+  },
+  "annotator_3": {
+    "name": "Charlie",
+    "start_idx": 200,
+    "end_idx": 300
+  }
+}
+```
+### Option B : Utiliser vos subsets FFGen
+Si vous avez utilisé `create_annotation_study.py` :
+```bash
+# Vous avez créé 10 subsets avec create_annotation_study.py
+# Configurez 10 annotateurs correspondants
+# Exemple pour 10 annotateurs, 40 items chacun
+python -c "
+import json
+config = {}
+for i in range(10):
+    config[f'annotator_{i+1}'] = {
+        'name': f'Annotateur {i+1}',
+        'start_idx': i * 40,
+        'end_idx': (i + 1) * 40,
+        'description': f'Subset {i+1}/10'
+    }
+with open('data/annotators.json', 'w') as f:
+    json.dump(config, f, indent=2)
+print('✅ Config créée pour 10 annotateurs')
+"
+```
+## Étape 5 : Pousser sur HF (1 min)
+```bash
+cd feedbacks-scoring-app
+# Vérifier les changements
+git status
+# Commiter
+git add .
+git commit -m "Add secure authentication and HF persistence"
+# Pousser vers HF Spaces
+git push origin main
+```
+Le Space va rebuilder automatiquement (3-5 minutes).
+## Étape 6 : Tester (2 min)
+1. Attendez que le Space soit "Running" (vert)
+2. Ouvrez l'app : https://huggingface.co/spaces/matis35/feedbacks-scoring-app
+3. Testez la connexion :
+   - Entrez le mot de passe (`APP_PASSWORD`)
+   - Sélectionnez un annotateur
+   - Vérifiez que ça fonctionne
+## Étape 7 : Distribuer aux Annotateurs
+Envoyez ce message à vos annotateurs :
+```
+Bonjour,
+Voici les informations pour accéder à l'outil d'annotation :
+URL : https://huggingface.co/spaces/matis35/feedbacks-scoring-app
+Mot de passe : [VOTRE_APP_PASSWORD]
+Votre identifiant : [annotator_X]
+Instructions :
+1. Ouvrez l'URL et entrez le mot de passe
+2. Sélectionnez votre identifiant dans la liste
+3. Chargez le dataset (je vous enverrai le lien/fichier)
+4. Annotez les feedbacks selon les critères ci-dessous
+5. IMPORTANT : Cliquez sur "☁️ Sauvegarder sur HF" toutes les 30-60 minutes
+6. Vous pouvez fermer et reprendre plus tard
+Critères d'annotation :
+- Score 1 : [DÉFINIR]
+- Score 2 : [DÉFINIR]
+- Score 3 : [DÉFINIR]
+- Score 4 : [DÉFINIR]
+- Score 5 : [DÉFINIR]
+Questions ? Contactez-moi : [VOTRE_EMAIL]
+```
+## Vérification Post-Déploiement
+✅ Checklist de vérification :
+- [ ] Le Space démarre sans erreur
+- [ ] L'authentification fonctionne
+- [ ] La sélection d'annotateur fonctionne
+- [ ] Le chargement de dataset fonctionne
+- [ ] Le filtrage par portion fonctionne (vérifier les nombres)
+- [ ] La sauvegarde HF fonctionne (vérifier dans le dataset)
+- [ ] L'export JSONL fonctionne
+- [ ] Les annotateurs peuvent se connecter
+## Commandes Utiles
+### Voir les logs du Space
+```bash
+# Via l'interface web : Settings > Logs
+# Ou regarder en temps réel depuis l'onglet "Logs"
+```
+### Vérifier les annotations sauvegardées
+```bash
+# Allez sur : https://huggingface.co/datasets/matis35/ffgen-annotations-storage
+# Vous devriez voir un dossier annotations/ avec des fichiers .json
+```
+### Télécharger toutes les annotations
+```bash
+huggingface-cli download \
+    matis35/ffgen-annotations-storage \
+    --repo-type dataset \
+    --local-dir ./collected_annotations
+```
+### Analyser l'accord inter-annotateurs (si gold items)
+```bash
+cd FFGen/3_data_processing
+python analyze_agreement.py \
+    ../../collected_annotations/annotations/*.json \
+    --gold-file annotation_study/gold_standard.json
+```
+## Dépannage Express
+### Le Space ne démarre pas
+```bash
+# Vérifiez les logs
+# Problème courant : secret mal configuré
+# Solution : Vérifiez Settings > Variables and secrets
+```
+### "HF Storage not configured"
+```bash
+# Il manque HF_TOKEN ou HF_DATASET_REPO
+# Ajoutez-les dans Settings > Secrets
+```
+### "Authentication failed"
+```bash
+# APP_PASSWORD incorrect ou manquant
+# Vérifiez Settings > Secrets
+```
+### Un annotateur voit tout le dataset
+```bash
+# Problème dans annotators.json
+# Vérifiez start_idx et end_idx
+# Rechargez le dataset après correction
+```
+### Les annotations disparaissent
+```bash
+# Les annotateurs n'ont pas sauvegardé sur HF
+# Rappelez-leur de cliquer sur "☁️ Sauvegarder sur HF"
+# La sauvegarde locale est perdue tous les 48h
+```
+## Support
+Documentation complète : [CONFIGURATION.md](CONFIGURATION.md)
+Problèmes ?
+- Vérifiez d'abord les logs du Space
+- Consultez la section Dépannage de CONFIGURATION.md
+- Ouvrez une issue GitHub si nécessaire
+---
+**Temps total : ~15 minutes**
+Prochain fichier à lire : [CONFIGURATION.md](CONFIGURATION.md) pour les détails complets.

README.md CHANGED Viewed

@@ -1,19 +1,148 @@
 ---
-title: Feedbacks Scoring
-emoji: 🚀
-colorFrom: red
-colorTo: red
 sdk: docker
-app_port: 8501
 tags:
 - streamlit
 pinned: false
-short_description: Streamlit template space
 ---
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

 ---
+title: Feedback Scoring App
+emoji: 🔐
+colorFrom: blue
+colorTo: purple
 sdk: docker
+app_port: 7860
 tags:
 - streamlit
+- annotation
+- feedback
+- secure
 pinned: false
+short_description: Secure code feedback annotation with multi-annotator support
 ---
+# Feedback Scoring Tool - Production Version
+Application Streamlit sécurisée pour l'annotation et le scoring de feedbacks de code par plusieurs annotateurs. Version prête pour le déploiement en production avec authentification, dispatch et persistance cloud.
+## Fonctionnalités
+### Annotation
+- 📊 **Interface de scoring intuitive** : Notation de 1 à 5 pour chaque feedback
+- 📈 **Statistiques en temps réel** : Suivi de votre progression et distribution des scores
+- 💬 **Commentaires** : Ajoutez des notes sur chaque annotation
+- 📥 **Import flexible** : Support des fichiers JSONL locaux et datasets HuggingFace
+### Sécurité et Collaboration
+- 🔐 **Authentification** : Accès protégé par mot de passe
+- 👥 **Multi-annotateurs** : Système de dispatch automatique par portion
+- 🎯 **Isolation** : Chaque annotateur ne voit que sa portion du dataset
+- 📋 **Traçabilité** : Toutes les annotations sont identifiées par annotateur
+### Persistance
+- ☁️ **Sauvegarde cloud** : Stockage permanent sur HuggingFace Dataset privé
+- 💾 **Auto-sauvegarde** : Progression sauvegardée automatiquement en local
+- 🔄 **Reprise de session** : Continuez où vous en étiez, même après redémarrage
+- 📤 **Export JSONL** : Téléchargez vos annotations à tout moment
+## Configuration pour Production
+Cette application nécessite une configuration avant déploiement. Voir **[CONFIGURATION.md](CONFIGURATION.md)** pour le guide complet.
+### Configuration rapide
+1. **Créer un dataset HF privé** pour stocker les annotations
+2. **Configurer les secrets** dans Settings du Space :
+   - `APP_PASSWORD` : Mot de passe d'accès
+   - `HF_TOKEN` : Token avec droits d'écriture
+   - `HF_DATASET_REPO` : Nom du dataset de stockage
+3. **Configurer les annotateurs** dans `data/annotators.json`
+4. **Pousser sur HF Spaces**
+📖 **Documentation complète** : [CONFIGURATION.md](CONFIGURATION.md)
+## Utilisation (Annotateurs)
+### Workflow
+1. **Connexion** : Entrez le mot de passe fourni
+2. **Identification** : Sélectionnez votre identifiant
+3. **Chargement** : L'app charge automatiquement votre portion
+4. **Annotation** : Scorez les feedbacks (1-5) avec commentaires
+5. **Sauvegarde** : Cliquez sur "☁️ Sauvegarder sur HF" régulièrement
+6. **Reprise** : Vous pouvez fermer et reprendre plus tard
+### Format du dataset
+```json
+{
+  "anchor": "code source",
+  "positive": "feedback positif",
+  "language": "python"
+}
+```
+## Architecture
+```
+backend/
+├── auth.py              # Authentification
+├── annotator_config.py  # Configuration annotateurs
+├── hf_storage.py        # Sauvegarde HuggingFace
+├── data_loader.py       # Chargement datasets
+├── persistence.py       # Sauvegarde locale
+├── export.py            # Export JSONL
+└── statistics.py        # Métriques
+frontend/
+├── components.py        # Composants UI
+├── styles.py            # Styles CSS
+└── help_page.py         # Page d'aide
+data/
+├── annotators.json      # Config annotateurs
+└── [sessions locales]   # Sauvegardes temporaires
+```
+## Développement
+### Local
+```bash
+pip install -r requirements.txt
+# Définir les variables d'environnement
+export APP_PASSWORD="dev"
+export HF_TOKEN="hf_xxxxx"
+export HF_DATASET_REPO="username/annotations"
+streamlit run app.py
+```
+### Docker
+```bash
+docker build -t feedback-scoring-app .
+docker run -p 7860:7860 \
+  -e APP_PASSWORD="dev" \
+  -e HF_TOKEN="hf_xxxxx" \
+  -e HF_DATASET_REPO="username/annotations" \
+  feedback-scoring-app
+```
+## Intégration avec FFGen
+Cette application s'intègre avec le pipeline [FFGen](https://github.com/YOUR_USERNAME/FFGen) :
+1. **Préparation** : Utilisez `create_annotation_study.py` pour créer les subsets
+2. **Annotation** : Utilisez cette app pour distribuer et collecter les scores
+3. **Analyse** : Utilisez `analyze_agreement.py` pour calculer l'accord
+4. **Fusion** : Utilisez `merge_scores.py` pour créer le dataset final
+## Sécurité
+- ✅ Authentification par mot de passe
+- ✅ Dataset de stockage privé
+- ✅ Isolation des annotateurs
+- ✅ Traçabilité complète
+- ✅ Pas de données sensibles en clair dans le code
+## Support
+- 📖 Documentation : [CONFIGURATION.md](CONFIGURATION.md)
+- 🐛 Issues : [GitHub Issues](https://github.com/YOUR_USERNAME/feedbacks-scoring-app/issues)
+- 📧 Contact : your.email@example.com
+## Licence
+Ce projet est open source.

app.py ADDED Viewed

	@@ -0,0 +1,415 @@

+"""
+Application Streamlit standalone pour le scoring de feedbacks
+Architecture backend/frontend modulaire avec persistance
+"""
+import streamlit as st
+from pathlib import Path
+from backend.data_loader import (
+    load_dataset_from_jsonl,
+    load_dataset_from_hf,
+    filter_items_with_positive
+)
+from backend.export import prepare_export, export_to_jsonl
+from backend.statistics import (
+    compute_progress,
+    compute_score_distribution,
+    compute_average_score,
+    compute_most_common_score,
+    find_unscored_indices
+)
+from backend.persistence import SessionManager
+from frontend.styles import apply_custom_css
+from frontend.components import (
+    render_navigation,
+    render_code_block,
+    render_feedback_block,
+    render_score_slider,
+    render_comment_field,
+    render_progress_metrics,
+    render_statistics,
+    render_export_section,
+    render_quick_actions
+)
+from frontend.help_page import render_help_page
+# Configuration de la page
+st.set_page_config(
+    page_title="Feedback Scoring Tool",
+    page_icon="⭐",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+apply_custom_css()
+DATA_DIR = Path("/app/data")
+DATA_DIR.mkdir(exist_ok=True)
+# Initialize session manager
+session_manager = SessionManager(DATA_DIR)
+# Initialize session state
+if 'dataset' not in st.session_state:
+    st.session_state.dataset = None
+if 'feedback_scores' not in st.session_state:
+    st.session_state.feedback_scores = {}
+if 'scoring_index' not in st.session_state:
+    st.session_state.scoring_index = 0
+if 'feedback_comments' not in st.session_state:
+    st.session_state.feedback_comments = {}
+if 'items_with_positive' not in st.session_state:
+    st.session_state.items_with_positive = []
+if 'show_help' not in st.session_state:
+    st.session_state.show_help = False
+if 'session_loaded' not in st.session_state:
+    st.session_state.session_loaded = False
+if not st.session_state.session_loaded:
+    session_info = session_manager.get_session_info()
+    if session_info:
+        st.session_state.session_loaded = True
+# Sidebar - Configuration
+with st.sidebar:
+    st.markdown('<div class="section-header">Menu</div>', unsafe_allow_html=True)
+    # Help button - Change text based on current state
+    button_text = "Retour aux annotations" if st.session_state.show_help else "Aide & Documentation"
+    button_type = "secondary" if st.session_state.show_help else "primary"
+    if st.button(button_text, use_container_width=True, type=button_type):
+        st.session_state.show_help = not st.session_state.show_help
+        st.rerun()
+    # Check for saved session
+    session_info = session_manager.get_session_info()
+    if session_info and st.session_state.dataset is None:
+        st.markdown('<div class="section-header">Session Sauvegardée</div>', unsafe_allow_html=True)
+        st.info(f"""
+        **Session trouvée**
+        - Dataset: {session_info['dataset_size']} exemples
+        - Scorés: {session_info['total_scored']}
+        - Position: {session_info['scoring_index'] + 1}
+        - Dernière sauvegarde: {session_info['last_saved'][:19]}
+        """)
+        col1, col2 = st.columns(2)
+        with col1:
+            if st.button("Reprendre", use_container_width=True, type="primary"):
+                with st.spinner("Chargement de la session..."):
+                    dataset, _ = session_manager.load_dataset()
+                    session_data = session_manager.load_session()
+                    st.session_state.dataset = dataset
+                    st.session_state.feedback_scores = session_data['feedback_scores']
+                    st.session_state.feedback_comments = session_data['feedback_comments']
+                    st.session_state.scoring_index = session_data['scoring_index']
+                    st.session_state.show_help = False
+                    st.success("Session reprise")
+                    st.rerun()
+        with col2:
+            if st.button("Nouvelle", use_container_width=True):
+                if 'confirm_clear' not in st.session_state:
+                    st.session_state.confirm_clear = True
+                    st.warning("Cliquer encore pour confirmer")
+                else:
+                    session_manager.clear_session()
+                    st.session_state.confirm_clear = False
+                    st.success("Session effacée")
+                    st.rerun()
+        st.markdown("---")
+    st.markdown('<div class="section-header">Nouveau Dataset</div>', unsafe_allow_html=True)
+    # Charger un dataset
+    upload_option = st.radio(
+        "Source:",
+        ["Fichier local (.jsonl)", "HuggingFace Hub"],
+        label_visibility="collapsed"
+    )
+    if upload_option == "Fichier local (.jsonl)":
+        uploaded_file = st.file_uploader("Fichier JSONL", type=['jsonl'], label_visibility="collapsed")
+        if uploaded_file is not None:
+            if st.button("Charger", use_container_width=True):
+                with st.spinner("Chargement..."):
+                    dataset = load_dataset_from_jsonl(uploaded_file)
+                    # Save dataset
+                    session_manager.save_dataset(dataset, {
+                        'source': 'local_file',
+                        'filename': uploaded_file.name
+                    })
+                    st.session_state.dataset = dataset
+                    st.session_state.scoring_index = 0
+                    st.session_state.feedback_scores = {}
+                    st.session_state.feedback_comments = {}
+                    st.session_state.show_help = False
+                    # Save initial empty session
+                    session_manager.save_session(
+                        st.session_state.feedback_scores,
+                        st.session_state.feedback_comments,
+                        st.session_state.scoring_index
+                    )
+                    st.success(f"Dataset chargé: {len(dataset)} exemples")
+                    st.rerun()
+    else:  # HuggingFace Hub
+        hf_dataset = st.text_input("Dataset HF", placeholder="username/dataset", label_visibility="collapsed")
+        hf_split = st.text_input("Split", value="train", label_visibility="collapsed")
+        if st.button("Charger depuis HF", use_container_width=True):
+            if hf_dataset:
+                with st.spinner(f"Téléchargement de {hf_dataset}..."):
+                    try:
+                        dataset = load_dataset_from_hf(hf_dataset, hf_split)
+                        # Save dataset locally
+                        session_manager.save_dataset(dataset, {
+                            'source': 'huggingface',
+                            'dataset_name': hf_dataset,
+                            'split': hf_split
+                        })
+                        st.session_state.dataset = dataset
+                        st.session_state.scoring_index = 0
+                        st.session_state.feedback_scores = {}
+                        st.session_state.feedback_comments = {}
+                        st.session_state.show_help = False
+                        # Save initial empty session
+                        session_manager.save_session(
+                            st.session_state.feedback_scores,
+                            st.session_state.feedback_comments,
+                            st.session_state.scoring_index
+                        )
+                        st.success(f"Dataset chargé et sauvegardé: {len(dataset)} exemples")
+                        st.rerun()
+                    except Exception as e:
+                        st.error(f"Erreur: {str(e)}")
+            else:
+                st.warning("Entrez un nom de dataset")
+# Main content - Show help or scoring interface
+if st.session_state.show_help:
+    render_help_page()
+    st.stop()
+# Titre
+st.title("Scoring des Feedbacks")
+# Main content
+if not st.session_state.dataset:
+    st.warning("Aucun dataset chargé.")
+    st.info("Vérifiez la sidebar : vous avez peut-être une session sauvegardée à reprendre")
+    st.info("Cliquez sur Aide & Documentation pour le guide complet")
+    st.stop()
+# Filter items with positive feedback
+dataset = st.session_state.dataset
+items_with_positive = filter_items_with_positive(dataset)
+st.session_state.items_with_positive = items_with_positive
+if not items_with_positive:
+    st.error("Aucun feedback positif trouvé dans le dataset.")
+    st.info("Le dataset doit contenir un champ 'positive' avec du texte.")
+    st.stop()
+# Navigation
+new_index = render_navigation(st.session_state.scoring_index, len(items_with_positive))
+if new_index is not None:
+    st.session_state.scoring_index = new_index
+    # Auto-save on navigation
+    session_manager.save_session(
+        st.session_state.feedback_scores,
+        st.session_state.feedback_comments,
+        st.session_state.scoring_index
+    )
+    st.rerun()
+st.markdown("---")
+# Get current item
+original_idx, current_item = items_with_positive[st.session_state.scoring_index]
+# Display code
+code_text = current_item.get('anchor', current_item.get('code', 'N/A'))
+language = current_item.get('language', 'python')
+render_code_block(code_text, language)
+st.markdown("---")
+# Display positive feedback
+positive_feedback = current_item.get('positive', 'N/A')
+render_feedback_block(positive_feedback)
+st.markdown("---")
+# Scoring interface
+score_key = f"score_{original_idx}"
+current_score = st.session_state.feedback_scores.get(original_idx, 3)
+score = render_score_slider(score_key, current_score)
+# Auto-save when score changes
+if score != current_score:
+    st.session_state.feedback_scores[original_idx] = score
+    session_manager.save_session(
+        st.session_state.feedback_scores,
+        st.session_state.feedback_comments,
+        st.session_state.scoring_index
+    )
+st.session_state.feedback_scores[original_idx] = score
+# Comment field
+comment_key = f"comment_{original_idx}"
+current_comment = st.session_state.feedback_comments.get(original_idx, "")
+comment = render_comment_field(comment_key, current_comment)
+st.session_state.feedback_comments[original_idx] = comment
+# Auto-save when comment changes
+if comment != current_comment:
+    session_manager.save_session(
+        st.session_state.feedback_scores,
+        st.session_state.feedback_comments,
+        st.session_state.scoring_index
+    )
+st.markdown("---")
+# Progress and statistics
+scored_items = len(st.session_state.feedback_scores)
+total_items = len(items_with_positive)
+progress_pct = compute_progress(scored_items, total_items) * 100
+render_progress_metrics(
+    total=total_items,
+    scored=scored_items,
+    remaining=total_items - scored_items,
+    progress_pct=progress_pct
+)
+# Statistics
+if st.session_state.feedback_scores:
+    avg_score = compute_average_score(st.session_state.feedback_scores)
+    most_common = compute_most_common_score(st.session_state.feedback_scores)
+    score_counts = compute_score_distribution(st.session_state.feedback_scores)
+    render_statistics(avg_score, most_common, score_counts)
+st.markdown("---")
+# Export section
+export_data = prepare_export(
+    items_with_positive,
+    st.session_state.feedback_scores,
+    st.session_state.feedback_comments
+)
+col1, col2 = render_export_section(export_data)
+# Téléchargement JSONL
+with col2:
+    if export_data:
+        jsonl_content = export_to_jsonl(export_data)
+        st.download_button(
+            label="Télécharger JSONL",
+            data=jsonl_content,
+            file_name="feedback_scores.jsonl",
+            mime="application/jsonl",
+            use_container_width=True,
+            type="primary"
+        )
+    else:
+        st.button(
+            "📥 Télécharger JSONL",
+            disabled=True,
+            use_container_width=True,
+            help="Aucun score enregistré"
+        )
+# Show export preview
+if export_data:
+    with st.expander("Aperçu Export (5 premiers)"):
+        preview_items = export_data[:5]
+        st.json(preview_items)
+# Quick actions
+st.markdown("---")
+reset_requested, jump_to_unscored, jump_to_index = render_quick_actions(
+    items_with_positive,
+    st.session_state.feedback_scores,
+    st.session_state.scoring_index
+)
+# Handle quick actions
+if reset_requested:
+    if st.session_state.feedback_scores:
+        if 'confirm_reset' not in st.session_state:
+            st.session_state.confirm_reset = True
+            st.warning("Cliquez à nouveau pour confirmer la suppression de tous les scores")
+        else:
+            st.session_state.feedback_scores = {}
+            st.session_state.feedback_comments = {}
+            st.session_state.confirm_reset = False
+            # Save cleared session
+            session_manager.save_session(
+                st.session_state.feedback_scores,
+                st.session_state.feedback_comments,
+                st.session_state.scoring_index
+            )
+            st.success("Scores réinitialisés")
+            st.rerun()
+if jump_to_unscored:
+    unscored_indices = find_unscored_indices(items_with_positive, st.session_state.feedback_scores)
+    if unscored_indices:
+        for pos, (idx, _) in enumerate(items_with_positive):
+            if idx == unscored_indices[0]:
+                st.session_state.scoring_index = pos
+                # Save position
+                session_manager.save_session(
+                    st.session_state.feedback_scores,
+                    st.session_state.feedback_comments,
+                    st.session_state.scoring_index
+                )
+                st.rerun()
+                break
+if jump_to_index is not None:
+    st.session_state.scoring_index = jump_to_index
+    # Save position
+    session_manager.save_session(
+        st.session_state.feedback_scores,
+        st.session_state.feedback_comments,
+        st.session_state.scoring_index
+    )
+    st.rerun()
+# Footer
+st.markdown("---")
+st.caption("Sauvegarde automatique activée | Vous pouvez fermer et reprendre plus tard")

backend/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@

+"""
+Backend module for scoring app
+"""

backend/annotator_config.py ADDED Viewed

	@@ -0,0 +1,279 @@

+"""
+Configuration et gestion des annotateurs
+Permet de définir qui annote quelle partie du dataset
+"""
+import json
+import os
+import streamlit as st
+from pathlib import Path
+# Configuration par défaut des annotateurs
+DEFAULT_ANNOTATOR_CONFIG = {
+    "annotator_1": {
+        "name": "Expert A",
+        "start_idx": 0,
+        "end_idx": 100,
+        "description": "Première portion du dataset"
+    },
+    "annotator_2": {
+        "name": "Expert B",
+        "start_idx": 100,
+        "end_idx": 200,
+        "description": "Deuxième portion du dataset"
+    },
+    "annotator_3": {
+        "name": "Expert C",
+        "start_idx": 200,
+        "end_idx": 300,
+        "description": "Troisième portion du dataset"
+    },
+}
+def load_annotator_config():
+    """
+    Charge la configuration des annotateurs.
+    Ordre de priorité:
+    1. Fichier annotators.json dans /app/data/
+    2. st.secrets["ANNOTATOR_CONFIG"]
+    3. Variable d'environnement ANNOTATOR_CONFIG
+    4. Configuration par défaut
+    Returns:
+        dict: Configuration des annotateurs
+    """
+    # 1. Essayer de charger depuis un fichier local
+    config_file = Path("/app/data/annotators.json")
+    if config_file.exists():
+        try:
+            with open(config_file, 'r', encoding='utf-8') as f:
+                config = json.load(f)
+                return config
+        except Exception as e:
+            st.warning(f"⚠️ Erreur lors du chargement de annotators.json: {e}")
+    # 2. Essayer st.secrets (HF Spaces)
+    try:
+        config_str = st.secrets.get("ANNOTATOR_CONFIG")
+        if config_str:
+            return json.loads(config_str)
+    except (FileNotFoundError, KeyError, json.JSONDecodeError):
+        pass
+    # 3. Essayer variable d'environnement
+    config_str = os.getenv("ANNOTATOR_CONFIG")
+    if config_str:
+        try:
+            return json.loads(config_str)
+        except json.JSONDecodeError:
+            st.warning("⚠️ ANNOTATOR_CONFIG mal formaté")
+    # 4. Retourner la config par défaut
+    return DEFAULT_ANNOTATOR_CONFIG
+def save_annotator_config(config):
+    """
+    Sauvegarde la configuration des annotateurs dans un fichier local.
+    Args:
+        config: Dict de configuration
+    Returns:
+        bool: True si succès
+    """
+    try:
+        config_file = Path("/app/data/annotators.json")
+        config_file.parent.mkdir(exist_ok=True)
+        with open(config_file, 'w', encoding='utf-8') as f:
+            json.dump(config, f, indent=2, ensure_ascii=False)
+        return True
+    except Exception as e:
+        st.error(f"❌ Erreur lors de la sauvegarde: {e}")
+        return False
+def get_annotator_config(annotator_id):
+    """
+    Récupère la configuration d'un annotateur spécifique.
+    Args:
+        annotator_id: ID de l'annotateur
+    Returns:
+        dict ou None: Configuration de l'annotateur
+    """
+    config = load_annotator_config()
+    return config.get(annotator_id)
+def filter_dataset_for_annotator(dataset, annotator_config):
+    """
+    Filtre un dataset pour ne garder que la portion d'un annotateur.
+    Args:
+        dataset: Liste d'items du dataset
+        annotator_config: Config de l'annotateur avec start_idx et end_idx
+    Returns:
+        list: Portion filtrée du dataset
+    """
+    start = annotator_config.get("start_idx", 0)
+    end = annotator_config.get("end_idx", len(dataset))
+    # S'assurer que les indices sont valides
+    start = max(0, min(start, len(dataset)))
+    end = max(start, min(end, len(dataset)))
+    return dataset[start:end]
+def validate_annotator_config(config):
+    """
+    Valide une configuration d'annotateurs.
+    Args:
+        config: Dict de configuration
+    Returns:
+        (bool, list): (is_valid, list_of_errors)
+    """
+    errors = []
+    if not isinstance(config, dict):
+        errors.append("La configuration doit être un dictionnaire")
+        return False, errors
+    for ann_id, ann_config in config.items():
+        if not isinstance(ann_config, dict):
+            errors.append(f"{ann_id}: La configuration doit être un dict")
+            continue
+        # Vérifier les champs requis
+        required_fields = ["name", "start_idx", "end_idx"]
+        for field in required_fields:
+            if field not in ann_config:
+                errors.append(f"{ann_id}: Champ '{field}' manquant")
+        # Vérifier les types
+        if "start_idx" in ann_config and not isinstance(ann_config["start_idx"], int):
+            errors.append(f"{ann_id}: start_idx doit être un entier")
+        if "end_idx" in ann_config and not isinstance(ann_config["end_idx"], int):
+            errors.append(f"{ann_id}: end_idx doit être un entier")
+        # Vérifier la logique
+        if "start_idx" in ann_config and "end_idx" in ann_config:
+            if ann_config["start_idx"] >= ann_config["end_idx"]:
+                errors.append(f"{ann_id}: start_idx doit être < end_idx")
+    return len(errors) == 0, errors
+def create_annotator_config_from_chunks(num_annotators, total_items):
+    """
+    Crée automatiquement une configuration pour diviser un dataset en chunks.
+    Args:
+        num_annotators: Nombre d'annotateurs
+        total_items: Nombre total d'items dans le dataset
+    Returns:
+        dict: Configuration générée
+    """
+    items_per_annotator = total_items // num_annotators
+    config = {}
+    for i in range(num_annotators):
+        ann_id = f"annotator_{i+1}"
+        start_idx = i * items_per_annotator
+        # Le dernier annotateur prend tout ce qui reste
+        if i == num_annotators - 1:
+            end_idx = total_items
+        else:
+            end_idx = (i + 1) * items_per_annotator
+        config[ann_id] = {
+            "name": f"Annotateur {i+1}",
+            "start_idx": start_idx,
+            "end_idx": end_idx,
+            "description": f"Items {start_idx} à {end_idx-1}"
+        }
+    return config
+def show_annotator_config_editor():
+    """
+    Affiche un éditeur de configuration des annotateurs (admin).
+    """
+    st.markdown("## ⚙️ Configuration des Annotateurs")
+    config = load_annotator_config()
+    st.info("Cette section permet de configurer les portions du dataset pour chaque annotateur")
+    # Afficher la config actuelle
+    st.json(config)
+    # Option pour créer une nouvelle config
+    with st.expander("Créer une nouvelle configuration"):
+        col1, col2 = st.columns(2)
+        with col1:
+            num_annotators = st.number_input(
+                "Nombre d'annotateurs",
+                min_value=1,
+                max_value=20,
+                value=3
+            )
+        with col2:
+            total_items = st.number_input(
+                "Nombre total d'items",
+                min_value=1,
+                value=300
+            )
+        if st.button("Générer la configuration", type="primary"):
+            new_config = create_annotator_config_from_chunks(num_annotators, total_items)
+            st.json(new_config)
+            if st.button("Sauvegarder cette configuration"):
+                if save_annotator_config(new_config):
+                    st.success("✅ Configuration sauvegardée")
+                    st.rerun()
+    # Éditeur manuel
+    with st.expander("Éditer manuellement (JSON)"):
+        st.markdown("**Format attendu:**")
+        st.code(json.dumps(DEFAULT_ANNOTATOR_CONFIG, indent=2), language="json")
+        config_text = st.text_area(
+            "Configuration JSON",
+            value=json.dumps(config, indent=2),
+            height=300
+        )
+        if st.button("Valider et sauvegarder"):
+            try:
+                new_config = json.loads(config_text)
+                is_valid, errors = validate_annotator_config(new_config)
+                if is_valid:
+                    if save_annotator_config(new_config):
+                        st.success("✅ Configuration validée et sauvegardée")
+                        st.rerun()
+                else:
+                    st.error("❌ Configuration invalide:")
+                    for error in errors:
+                        st.error(f"  - {error}")
+            except json.JSONDecodeError as e:
+                st.error(f"❌ JSON invalide: {e}")

backend/auth.py ADDED Viewed

	@@ -0,0 +1,148 @@

+"""
+Module d'authentification pour l'application de scoring
+Gère l'authentification par mot de passe et l'identification des annotateurs
+"""
+import streamlit as st
+import os
+import warnings
+# Supprimer les warnings de secrets manquants
+warnings.filterwarnings('ignore', message='.*secrets.*')
+def check_password():
+    """
+    Vérifie que l'utilisateur a le bon mot de passe.
+    Returns True si authentifié, False sinon.
+    """
+    def password_entered():
+        """Vérifie si le mot de passe est correct"""
+        # Support pour secrets (HF Spaces) ou variable d'environnement
+        correct_password = None
+        # Essayer d'abord st.secrets (HF Spaces) - sans message d'erreur
+        try:
+            correct_password = st.secrets.get("APP_PASSWORD")
+        except:
+            pass
+        # Fallback sur variable d'environnement
+        if not correct_password:
+            correct_password = os.getenv("APP_PASSWORD")
+        # Fallback sur mot de passe par défaut
+        if not correct_password:
+            correct_password = "annotator2025"
+        if st.session_state["password"] == correct_password:
+            st.session_state["password_correct"] = True
+            del st.session_state["password"]  # Ne pas garder le mot de passe en clair
+        else:
+            st.session_state["password_correct"] = False
+    # Premier affichage ou non authentifié
+    if "password_correct" not in st.session_state:
+        st.markdown("## 🔐 Authentification Annotateur")
+        st.info("Entrez le mot de passe fourni par l'équipe de recherche")
+        st.text_input(
+            "Mot de passe",
+            type="password",
+            on_change=password_entered,
+            key="password",
+            label_visibility="collapsed"
+        )
+        return False
+    elif not st.session_state["password_correct"]:
+        st.markdown("## 🔐 Authentification Annotateur")
+        st.text_input(
+            "Mot de passe",
+            type="password",
+            on_change=password_entered,
+            key="password",
+            label_visibility="collapsed"
+        )
+        st.error("❌ Mot de passe incorrect")
+        return False
+    else:
+        # Authentifié
+        return True
+def get_annotator_identity():
+    """
+    Récupère ou demande l'identité de l'annotateur.
+    Returns: (annotator_id, annotator_name) ou None si non défini
+    """
+    if "annotator_id" not in st.session_state:
+        st.session_state.annotator_id = None
+        st.session_state.annotator_name = None
+    return st.session_state.annotator_id, st.session_state.annotator_name
+def set_annotator_identity(annotator_id, annotator_name):
+    """
+    Définit l'identité de l'annotateur dans la session
+    """
+    st.session_state.annotator_id = annotator_id
+    st.session_state.annotator_name = annotator_name
+def show_annotator_selector(annotator_config):
+    """
+    Affiche un sélecteur d'annotateur basé sur la configuration.
+    Args:
+        annotator_config: Dict avec la configuration des annotateurs
+        Format: {
+            "annotator_1": {
+                "name": "Expert A",
+                "start_idx": 0,
+                "end_idx": 100
+            },
+            ...
+        }
+    Returns:
+        (annotator_id, config) ou (None, None) si pas sélectionné
+    """
+    st.markdown("## 👤 Sélection de l'annotateur")
+    st.info("Sélectionnez votre identifiant pour charger votre portion du dataset")
+    # Créer la liste des options
+    options = ["-- Choisissez votre identifiant --"]
+    annotator_mapping = {}
+    for ann_id, ann_config in annotator_config.items():
+        display_name = f"{ann_config['name']} (Items {ann_config['start_idx']}-{ann_config['end_idx']})"
+        options.append(display_name)
+        annotator_mapping[display_name] = (ann_id, ann_config)
+    selected = st.selectbox(
+        "Annotateur",
+        options,
+        key="annotator_selector",
+        label_visibility="collapsed"
+    )
+    if selected == options[0]:
+        return None, None
+    annotator_id, config = annotator_mapping[selected]
+    # Afficher un résumé
+    st.success(f"""
+    ✅ **{config['name']}**
+    - Items à annoter: {config['end_idx'] - config['start_idx']}
+    - Range: [{config['start_idx']}, {config['end_idx']}[
+    """)
+    if st.button("Confirmer et continuer", type="primary", use_container_width=True):
+        set_annotator_identity(annotator_id, config['name'])
+        return annotator_id, config
+    return None, None

backend/data_loader.py ADDED Viewed

	@@ -0,0 +1,31 @@

+"""
+Data loading utilities
+"""
+import json
+from datasets import load_dataset
+def load_dataset_from_jsonl(file):
+    """Charge un dataset depuis un fichier JSONL"""
+    data = []
+    content = file.getvalue().decode('utf-8')
+    for line in content.split('\n'):
+        if line.strip():
+            data.append(json.loads(line))
+    return data
+def load_dataset_from_hf(dataset_name, split='train'):
+    """Charge un dataset depuis HuggingFace"""
+    dataset = load_dataset(dataset_name, split=split)
+    return list(dataset)
+def filter_items_with_positive(dataset):
+    """Filtre les items qui ont un champ 'positive' non vide"""
+    items_with_positive = []
+    for idx, item in enumerate(dataset):
+        if 'positive' in item and item['positive']:
+            items_with_positive.append((idx, item))
+    return items_with_positive

backend/export.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""
+Export utilities
+"""
+import json
+def prepare_export(items_with_positive, feedback_scores, feedback_comments=None):
+    """
+    Prépare les données pour l'export avec identifiants uniques
+    Args:
+        items_with_positive: Liste de tuples (original_idx, item)
+        feedback_scores: Dict {idx: score}
+        feedback_comments: Dict {idx: comment} (optionnel)
+    Returns:
+        Liste de dicts prêts pour l'export
+    """
+    export_data = []
+    for original_idx, item in items_with_positive:
+        if original_idx in feedback_scores:
+            export_item = {
+                'code': item.get('anchor', item.get('code', '')),
+                'positive_feedback': item.get('positive', ''),
+                'score': feedback_scores[original_idx],
+            }
+            # Add unique identifiers for merging
+            # Priority: code_id > author_id > hash of content
+            if 'code_id' in item:
+                export_item['code_id'] = item['code_id']
+            if 'author_id' in item:
+                export_item['author_id'] = item['author_id']
+            # Add optional fields
+            if 'language' in item:
+                export_item['language'] = item['language']
+            # Add comment if exists
+            if feedback_comments and original_idx in feedback_comments:
+                comment = feedback_comments[original_idx]
+                if comment.strip():
+                    export_item['comment'] = comment
+            # Add original index for reference (local to this file)
+            export_item['original_index'] = original_idx
+            export_data.append(export_item)
+    return export_data
+def export_to_jsonl(export_data):
+    """
+    Convertit les données en format JSONL
+    Args:
+        export_data: Liste de dicts
+    Returns:
+        String JSONL
+    """
+    return "\n".join(json.dumps(item, ensure_ascii=False) for item in export_data)

backend/hf_storage.py ADDED Viewed

	@@ -0,0 +1,237 @@

+"""
+Module de sauvegarde des annotations sur HuggingFace Dataset
+Permet une persistance permanente même si le Space redémarre
+"""
+import json
+import os
+from datetime import datetime
+from pathlib import Path
+import tempfile
+import streamlit as st
+def get_hf_config():
+    """
+    Récupère la configuration HuggingFace depuis les secrets ou variables d'env.
+    Returns:
+        (hf_token, dataset_repo) ou (None, None) si non configuré
+    """
+    hf_token = None
+    dataset_repo = None
+    # Essayer d'abord st.secrets (HF Spaces)
+    try:
+        hf_token = st.secrets.get("HF_TOKEN")
+        dataset_repo = st.secrets.get("HF_DATASET_REPO")
+    except (FileNotFoundError, KeyError):
+        pass
+    # Fallback sur variables d'environnement
+    if not hf_token:
+        hf_token = os.getenv("HF_TOKEN")
+    if not dataset_repo:
+        dataset_repo = os.getenv("HF_DATASET_REPO")
+    return hf_token, dataset_repo
+def is_hf_storage_enabled():
+    """
+    Vérifie si la sauvegarde HF est configurée et disponible.
+    Returns:
+        bool: True si configuré, False sinon
+    """
+    hf_token, dataset_repo = get_hf_config()
+    return hf_token is not None and dataset_repo is not None
+def save_annotations_to_hf(
+    annotator_id,
+    annotator_name,
+    feedback_scores,
+    feedback_comments,
+    dataset_metadata=None
+):
+    """
+    Sauvegarde les annotations sur un Dataset HuggingFace.
+    Args:
+        annotator_id: ID de l'annotateur
+        annotator_name: Nom de l'annotateur
+        feedback_scores: Dict des scores {item_id: score}
+        feedback_comments: Dict des commentaires {item_id: comment}
+        dataset_metadata: Metadata du dataset annoté (optionnel)
+    Returns:
+        bool: True si succès, False sinon
+    """
+    try:
+        from huggingface_hub import HfApi
+        hf_token, dataset_repo = get_hf_config()
+        if not hf_token or not dataset_repo:
+            st.error("❌ Configuration HuggingFace manquante (HF_TOKEN ou HF_DATASET_REPO)")
+            return False
+        # Préparer les données d'annotation
+        timestamp = datetime.now().isoformat()
+        annotation_data = {
+            "annotator_id": annotator_id,
+            "annotator_name": annotator_name,
+            "timestamp": timestamp,
+            "scores": feedback_scores,
+            "comments": feedback_comments,
+            "num_scored": len(feedback_scores),
+            "metadata": dataset_metadata or {}
+        }
+        # Créer un fichier temporaire
+        filename = f"annotation_{annotator_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False, encoding='utf-8') as f:
+            json.dump(annotation_data, f, indent=2, ensure_ascii=False)
+            temp_path = f.name
+        try:
+            # Upload vers le Dataset HF
+            api = HfApi(token=hf_token)
+            # Créer le repo s'il n'existe pas
+            try:
+                api.create_repo(
+                    repo_id=dataset_repo,
+                    repo_type="dataset",
+                    private=True,
+                    exist_ok=True
+                )
+            except Exception as e:
+                # Le repo existe déjà, c'est OK
+                pass
+            # Upload le fichier
+            api.upload_file(
+                path_or_fileobj=temp_path,
+                path_in_repo=f"annotations/{filename}",
+                repo_id=dataset_repo,
+                repo_type="dataset",
+                commit_message=f"Add annotations from {annotator_name} ({annotator_id})"
+            )
+            return True
+        finally:
+            # Nettoyer le fichier temporaire
+            try:
+                os.unlink(temp_path)
+            except:
+                pass
+    except ImportError:
+        st.error("❌ Package 'huggingface_hub' non installé")
+        return False
+    except Exception as e:
+        st.error(f"❌ Erreur lors de la sauvegarde HF: {str(e)}")
+        return False
+def load_annotations_from_hf(annotator_id):
+    """
+    Charge les annotations d'un annotateur depuis le Dataset HF.
+    Args:
+        annotator_id: ID de l'annotateur
+    Returns:
+        dict ou None: Données d'annotation si trouvées, None sinon
+    """
+    try:
+        from huggingface_hub import HfApi, hf_hub_download
+        hf_token, dataset_repo = get_hf_config()
+        if not hf_token or not dataset_repo:
+            return None
+        api = HfApi(token=hf_token)
+        # Lister les fichiers dans le repo
+        try:
+            files = api.list_repo_files(repo_id=dataset_repo, repo_type="dataset")
+        except Exception:
+            # Le repo n'existe pas encore
+            return None
+        # Chercher les fichiers de cet annotateur
+        annotation_files = [
+            f for f in files
+            if f.startswith(f"annotations/annotation_{annotator_id}_")
+        ]
+        if not annotation_files:
+            return None
+        # Prendre le plus récent
+        latest_file = sorted(annotation_files)[-1]
+        # Télécharger et charger
+        local_path = hf_hub_download(
+            repo_id=dataset_repo,
+            filename=latest_file,
+            repo_type="dataset",
+            token=hf_token
+        )
+        with open(local_path, 'r', encoding='utf-8') as f:
+            return json.load(f)
+    except ImportError:
+        return None
+    except Exception as e:
+        st.warning(f"⚠️ Impossible de charger les annotations HF: {str(e)}")
+        return None
+def get_all_annotator_files(annotator_id=None):
+    """
+    Liste tous les fichiers d'annotations (optionnellement pour un annotateur).
+    Args:
+        annotator_id: ID de l'annotateur (None = tous)
+    Returns:
+        list: Liste des chemins de fichiers
+    """
+    try:
+        from huggingface_hub import HfApi
+        hf_token, dataset_repo = get_hf_config()
+        if not hf_token or not dataset_repo:
+            return []
+        api = HfApi(token=hf_token)
+        try:
+            files = api.list_repo_files(repo_id=dataset_repo, repo_type="dataset")
+        except Exception:
+            return []
+        # Filtrer les fichiers d'annotation
+        annotation_files = [f for f in files if f.startswith("annotations/")]
+        if annotator_id:
+            annotation_files = [
+                f for f in annotation_files
+                if f.startswith(f"annotations/annotation_{annotator_id}_")
+            ]
+        return sorted(annotation_files)
+    except ImportError:
+        return []
+    except Exception:
+        return []

backend/persistence.py ADDED Viewed

	@@ -0,0 +1,157 @@

+"""
+Persistence utilities for datasets and scores
+"""
+import json
+from pathlib import Path
+from datetime import datetime
+class SessionManager:
+    """Gère la persistance des sessions de scoring"""
+    def __init__(self, data_dir="/app/data"):
+        self.data_dir = Path(data_dir)
+        self.data_dir.mkdir(exist_ok=True)
+        self.dataset_file = self.data_dir / "dataset.jsonl"
+        self.session_file = self.data_dir / "session.json"
+    def save_dataset(self, dataset, source_info=None):
+        """
+        Sauvegarde le dataset en JSONL
+        Args:
+            dataset: Liste de dicts
+            source_info: Dict avec info sur la source (nom HF, fichier, etc.)
+        """
+        # Sauvegarder le dataset
+        with open(self.dataset_file, 'w', encoding='utf-8') as f:
+            for item in dataset:
+                f.write(json.dumps(item, ensure_ascii=False) + '\n')
+        # Sauvegarder les métadonnées
+        metadata = {
+            'saved_at': datetime.now().isoformat(),
+            'total_items': len(dataset),
+            'source_info': source_info or {}
+        }
+        metadata_file = self.data_dir / "dataset_metadata.json"
+        with open(metadata_file, 'w', encoding='utf-8') as f:
+            json.dump(metadata, f, indent=2, ensure_ascii=False)
+        return True
+    def load_dataset(self):
+        """
+        Charge le dataset sauvegardé
+        Returns:
+            Tuple (dataset, metadata) ou (None, None) si pas de dataset sauvegardé
+        """
+        if not self.dataset_file.exists():
+            return None, None
+        # Charger le dataset
+        dataset = []
+        with open(self.dataset_file, 'r', encoding='utf-8') as f:
+            for line in f:
+                if line.strip():
+                    dataset.append(json.loads(line))
+        # Charger les métadonnées
+        metadata_file = self.data_dir / "dataset_metadata.json"
+        metadata = None
+        if metadata_file.exists():
+            with open(metadata_file, 'r', encoding='utf-8') as f:
+                metadata = json.load(f)
+        return dataset, metadata
+    def save_session(self, feedback_scores, feedback_comments, scoring_index):
+        """
+        Sauvegarde l'état de la session (scores, commentaires, position)
+        Args:
+            feedback_scores: Dict {idx: score}
+            feedback_comments: Dict {idx: comment}
+            scoring_index: Index actuel
+        """
+        session_data = {
+            'feedback_scores': {str(k): v for k, v in feedback_scores.items()},
+            'feedback_comments': {str(k): v for k, v in feedback_comments.items()},
+            'scoring_index': scoring_index,
+            'last_saved': datetime.now().isoformat(),
+            'total_scored': len(feedback_scores)
+        }
+        with open(self.session_file, 'w', encoding='utf-8') as f:
+            json.dump(session_data, f, indent=2, ensure_ascii=False)
+        return True
+    def load_session(self):
+        """
+        Charge l'état de la session sauvegardée
+        Returns:
+            Dict avec feedback_scores, feedback_comments, scoring_index
+            ou None si pas de session sauvegardée
+        """
+        if not self.session_file.exists():
+            return None
+        with open(self.session_file, 'r', encoding='utf-8') as f:
+            session_data = json.load(f)
+        # Convertir les clés en int
+        session_data['feedback_scores'] = {
+            int(k): v for k, v in session_data['feedback_scores'].items()
+        }
+        session_data['feedback_comments'] = {
+            int(k): v for k, v in session_data['feedback_comments'].items()
+        }
+        return session_data
+    def has_saved_session(self):
+        """Vérifie s'il existe une session sauvegardée"""
+        return self.dataset_file.exists() and self.session_file.exists()
+    def clear_session(self):
+        """Supprime la session sauvegardée"""
+        if self.dataset_file.exists():
+            self.dataset_file.unlink()
+        if self.session_file.exists():
+            self.session_file.unlink()
+        metadata_file = self.data_dir / "dataset_metadata.json"
+        if metadata_file.exists():
+            metadata_file.unlink()
+        return True
+    def get_session_info(self):
+        """
+        Récupère les informations sur la session sauvegardée
+        Returns:
+            Dict avec les infos ou None
+        """
+        if not self.has_saved_session():
+            return None
+        dataset, dataset_metadata = self.load_dataset()
+        session_data = self.load_session()
+        if dataset and session_data:
+            return {
+                'dataset_size': len(dataset),
+                'total_scored': session_data['total_scored'],
+                'last_saved': session_data['last_saved'],
+                'scoring_index': session_data['scoring_index'],
+                'source_info': dataset_metadata.get('source_info', {}) if dataset_metadata else {}
+            }
+        return None

backend/statistics.py ADDED Viewed

	@@ -0,0 +1,53 @@

+"""
+Statistics computation utilities
+"""
+def compute_progress(scored_count, total_count):
+    """Calcule le pourcentage de progression"""
+    if total_count == 0:
+        return 0.0
+    return scored_count / total_count
+def compute_score_distribution(feedback_scores):
+    """
+    Calcule la distribution des scores
+    Args:
+        feedback_scores: Dict {idx: score}
+    Returns:
+        Dict {score: count} pour scores 0-5
+    """
+    scores_list = list(feedback_scores.values())
+    return {i: scores_list.count(i) for i in range(6)}
+def compute_average_score(feedback_scores):
+    """Calcule le score moyen"""
+    if not feedback_scores:
+        return 0.0
+    scores_list = list(feedback_scores.values())
+    return sum(scores_list) / len(scores_list)
+def compute_most_common_score(feedback_scores):
+    """
+    Trouve le score le plus fréquent
+    Returns:
+        Tuple (score, count)
+    """
+    distribution = compute_score_distribution(feedback_scores)
+    return max(distribution.items(), key=lambda x: x[1])
+def find_unscored_indices(items_with_positive, feedback_scores):
+    """
+    Trouve les indices des items non encore scorés
+    Returns:
+        Liste d'indices
+    """
+    return [idx for idx, _ in items_with_positive if idx not in feedback_scores]

frontend/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@

+"""
+Frontend module for scoring app
+"""

frontend/components.py ADDED Viewed

	@@ -0,0 +1,189 @@

+"""
+UI Components
+"""
+import streamlit as st
+def render_navigation(current_index, total_items):
+    """
+    Affiche les boutons de navigation
+    Returns:
+        Nouvelle valeur de l'index si changement, None sinon
+    """
+    col1, col2, col3 = st.columns([1, 2, 1])
+    new_index = None
+    with col1:
+        if st.button("Précédent", use_container_width=True, disabled=current_index == 0):
+            new_index = current_index - 1
+    with col2:
+        st.markdown(f"<div class='center-text'>Exemple {current_index + 1} / {total_items}</div>", unsafe_allow_html=True)
+    with col3:
+        if st.button("Suivant", use_container_width=True, disabled=current_index >= total_items - 1):
+            new_index = current_index + 1
+    return new_index
+def render_code_block(code_text, language="python"):
+    """Affiche un bloc de code avec syntax highlighting"""
+    st.markdown("### Code")
+    st.code(code_text, language=language)
+def render_feedback_block(feedback_text):
+    """Affiche le feedback dans un bloc stylisé adapté au thème"""
+    st.markdown("### Feedback")
+    # Utilise la classe CSS qui s'adapte automatiquement au thème
+    st.markdown(f"""
+<div class="feedback-block">
+{feedback_text}
+</div>
+""", unsafe_allow_html=True)
+def render_score_slider(score_key, current_score=3):
+    """
+    Affiche le slider de notation
+    Returns:
+        Score sélectionné
+    """
+    st.markdown("### Notez le feedback")
+    st.caption("À quel point ce feedback est-il utile et pertinent pour ce code ?")
+    score = st.slider(
+        "Score",
+        min_value=0,
+        max_value=5,
+        value=current_score,
+        step=1,
+        format="%d",
+        key=score_key,
+        help="0 = Pas utile du tout, 5 = Extrêmement utile"
+    )
+    # Show score meaning
+    score_meanings = {
+        0: "Pas utile / Incorrect",
+        1: "Très peu utile",
+        2: "Peu utile",
+        3: "Moyennement utile",
+        4: "Utile",
+        5: "Extrêmement utile"
+    }
+    st.info(f"{score} - {score_meanings[score]}")
+    return score
+def render_comment_field(comment_key, current_comment=""):
+    """
+    Affiche le champ de commentaire optionnel
+    Returns:
+        Commentaire saisi
+    """
+    with st.expander("Ajouter un commentaire (optionnel)"):
+        comment = st.text_area(
+            "Commentaire",
+            value=current_comment,
+            key=comment_key,
+            placeholder="Pourquoi ce score ? (optionnel)",
+            label_visibility="collapsed"
+        )
+    return comment
+def render_progress_metrics(total, scored, remaining, progress_pct):
+    """Affiche les métriques de progression"""
+    st.markdown("### Progression")
+    col1, col2, col3, col4 = st.columns(4)
+    with col1:
+        st.metric("Total", total)
+    with col2:
+        st.metric("Scorés", scored)
+    with col3:
+        st.metric("Restants", remaining)
+    with col4:
+        st.metric("Progression", f"{progress_pct:.0f}%")
+    st.progress(progress_pct / 100)
+def render_statistics(avg_score, most_common_score, score_counts):
+    """Affiche les statistiques détaillées"""
+    st.markdown("### Statistiques des Scores")
+    col1, col2 = st.columns(2)
+    with col1:
+        st.metric("Score Moyen", f"{avg_score:.2f}")
+    with col2:
+        st.metric("Score le plus fréquent", f"{most_common_score[0]} ({most_common_score[1]}x)")
+    st.bar_chart(score_counts, height=200)
+def render_export_section(export_data):
+    """Affiche la section d'export"""
+    st.markdown("### Export")
+    col1, col2 = st.columns(2)
+    with col1:
+        st.metric("Items à exporter", len(export_data))
+    return col2
+def render_quick_actions(items_with_positive, feedback_scores, current_index):
+    """
+    Affiche les actions rapides
+    Returns:
+        Tuple (reset_requested, jump_to_unscored, jump_to_index)
+    """
+    st.markdown("### Actions Rapides")
+    col1, col2, col3 = st.columns(3)
+    reset_requested = False
+    jump_to_unscored = False
+    jump_to_index = None
+    with col1:
+        if st.button("Réinitialiser tous les scores", use_container_width=True):
+            reset_requested = True
+    with col2:
+        unscored_indices = [idx for idx, _ in items_with_positive if idx not in feedback_scores]
+        if unscored_indices and st.button("Aller au prochain non-scoré", use_container_width=True):
+            jump_to_unscored = True
+    with col3:
+        jump_to = st.number_input(
+            "Aller à l'exemple",
+            min_value=1,
+            max_value=len(items_with_positive),
+            value=current_index + 1,
+            step=1,
+            key="jump_to"
+        )
+        if st.button("Aller", use_container_width=True):
+            jump_to_index = jump_to - 1
+    return reset_requested, jump_to_unscored, jump_to_index

frontend/help_page.py ADDED Viewed

	@@ -0,0 +1,308 @@

+"""
+Page d'aide détaillée
+"""
+import streamlit as st
+def render_help_page():
+    """Affiche la page d'aide complète"""
+    st.title("Guide d'utilisation")
+    st.markdown("Tout ce que vous devez savoir pour utiliser l'application de scoring de feedbacks")
+    st.markdown("---")
+    # Section 1: Introduction
+    st.markdown("## Objectif de l'application")
+    st.markdown("""
+    Cette application permet de **noter la qualité des feedbacks** générés pour des snippets de code.
+    Vous allez évaluer si les feedbacks sont utiles, pertinents et bien formulés.
+    **Échelle de notation : 0 à 5**
+    - **0** : Pas utile / Incorrect
+    - **1** : Très peu utile
+    - **2** : Peu utile
+    - **3** : Moyennement utile
+    - **4** : Utile
+    - **5** : Extrêmement utile
+    """)
+    st.markdown("---")
+    # Section 2: Charger un dataset
+    st.markdown("## Charger un dataset")
+    col1, col2 = st.columns(2)
+    with col1:
+        st.markdown("### Option 1 : Fichier local")
+        st.markdown("""
+        1. Cliquez sur **"Choisissez une option: Fichier local (.jsonl)"**
+        2. Cliquez sur **"Browse files"**
+        3. Sélectionnez votre fichier `.jsonl`
+        4. Cliquez sur **"Charger le fichier"**
+        **Format attendu du JSONL :**
+        ```json
+        {"anchor": "code...", "positive": "feedback..."}
+        {"anchor": "code...", "positive": "feedback..."}
+        ```
+         Chaque ligne doit contenir au minimum :
+        - `anchor` ou `code` : le code source
+        - `positive` : le feedback à évaluer
+        """)
+    with col2:
+        st.markdown("### Option 2 : HuggingFace Hub")
+        st.markdown("""
+        1. Cliquez sur **"Choisissez une option: HuggingFace Hub"**
+        2. Entrez le nom du dataset (ex: `username/dataset-name`)
+        3. Choisissez le split (généralement `train`)
+        4. Cliquez sur **" Charger depuis HF"**
+        **Exemples de datasets :**
+        - `username/my-feedback-dataset`
+        - `organization/code-feedbacks-v1`
+         Le dataset doit être public ou vous devez être authentifié
+        """)
+    st.markdown("---")
+    # Section 3: Navigation
+    st.markdown("## Navigation entre les exemples")
+    st.markdown("""
+    ### Boutons de navigation
+    - **< Précédent** : Revenir à l'exemple précédent
+    - **Suivant >** : Passer à l'exemple suivant
+    - **Compteur central** : Affiche votre position (ex: "Exemple 5 / 100")
+    ### Navigation rapide
+    - **Aller au prochain non-scoré** : Saute directement au prochain exemple sans score
+    - **Aller à l'exemple N** : Entrez un numéro et cliquez sur "Aller" pour y accéder directement
+     **Astuce** : Utilisez les boutons Précédent/Suivant pour naviguer rapidement
+    """)
+    st.markdown("---")
+    # Section 4: Scoring
+    st.markdown("##  Noter un feedback")
+    st.markdown("""
+    ### 1. Lisez le code
+    Le code source s'affiche en haut avec coloration syntaxique.
+    ### 2. Lisez le feedback
+    Le feedback à évaluer s'affiche dans un encadré vert.
+    ### 3. Attribuez un score
+    Déplacez le **slider de 0 à 5** selon votre évaluation :
+    | Score | Signification | Quand l'utiliser |
+    |-------|---------------|------------------|
+    | **0** | Pas utile / Incorrect | Le feedback est faux, inutile ou hors sujet |
+    | **1** | Très peu utile | Le feedback est vague ou très générique |
+    | **2** | Peu utile | Le feedback manque de précision ou de profondeur |
+    | **3** | Moyennement utile | Le feedback est correct mais sans détails |
+    | **4** | Utile | Le feedback est pertinent et bien expliqué |
+    | **5** | Extrêmement utile | Le feedback est excellent, précis et actionnable |
+    ### 4. Ajoutez un commentaire (optionnel)
+    Cliquez sur **"Ajouter un commentaire"** pour justifier votre score.
+    Utile pour :
+    - Expliquer un score particulier
+    - Noter des détails spécifiques
+    - Documenter votre raisonnement
+    """)
+    st.markdown("---")
+    # Section 5: Progression
+    st.markdown("##  Suivre votre progression")
+    st.markdown("""
+    ### Métriques en temps réel
+    - **Total** : Nombre total d'exemples dans le dataset
+    - **Scorés** : Nombre d'exemples que vous avez notés
+    - **Restants** : Nombre d'exemples non scorés
+    - **Progression** : Pourcentage d'avancement
+    ### Barre de progression
+    Une barre visuelle vous montre votre avancement global.
+    ### Statistiques des scores
+    Quand vous avez noté au moins un exemple, vous verrez :
+    - **Score moyen** : Moyenne de tous vos scores
+    - **Score le plus fréquent** : Le score que vous utilisez le plus
+    - **Graphique de distribution** : Répartition de vos scores
+     Ces statistiques vous aident à voir si vous êtes cohérent dans vos notations
+    """)
+    st.markdown("---")
+    # Section 6: Export
+    st.markdown("##  Exporter vos scores")
+    st.markdown("""
+    ### Quand exporter ?
+    - À la fin de votre session de notation
+    - Régulièrement pour sauvegarder votre travail
+    - Quand vous voulez partager vos résultats
+    ### Comment exporter ?
+    1. Scrollez jusqu'à la section **"Export"**
+    2. Vérifiez le nombre d'items à exporter
+    3. Cliquez sur **"Télécharger JSONL"**
+    4. Le fichier sera téléchargé : `feedback_scores.jsonl`
+    ### Format du fichier exporté
+    ```json
+    {
+      "code": "code source...",
+      "feedback": "le feedback évalué...",
+      "score": 4,
+      "comment": "votre commentaire...",
+      "original_index": 42
+    }
+    ```
+    ### Aperçu de l'export
+    Cliquez sur **"Aperçu Export (5 premiers)"** pour voir à quoi ressemblera votre fichier.
+     **Important** : Les scores sont automatiquement enregistrés, mais téléchargez régulièrement votre fichier JSONL !
+    """)
+    st.markdown("---")
+    # Section 7: Actions avancées
+    st.markdown("## Actions avancées")
+    st.markdown("""
+    ### Réinitialiser tous les scores
+    **Attention** : Cette action supprime TOUS vos scores !
+    1. Cliquez sur **"Réinitialiser tous les scores"**
+    2. Un avertissement apparaît
+    3. Cliquez à nouveau pour confirmer
+    Pensez à exporter avant de réinitialiser !
+    ### Modifier un score existant
+    Vous pouvez revenir sur un exemple déjà noté et changer le score.
+    Le nouveau score remplacera automatiquement l'ancien.
+    """)
+    st.markdown("---")
+    # Section 8: Bonnes pratiques
+    st.markdown("## ✅ Bonnes pratiques")
+    col1, col2 = st.columns(2)
+    with col1:
+        st.markdown("""
+        ### Pour une notation cohérente
+        - Lisez d'abord plusieurs exemples avant de commencer
+        - Définissez vos critères de notation
+        - Utilisez les commentaires pour les cas limites
+        - Relisez régulièrement vos premiers scores
+        - Vérifiez votre distribution de scores
+        """)
+    with col2:
+        st.markdown("""
+        ### Pour éviter les erreurs
+        - Exportez régulièrement (toutes les 50 notations)
+        - N'utilisez pas "Réinitialiser" par erreur
+        - Lisez bien le code ET le feedback
+        - Faites des pauses régulières
+        - En cas de doute, mettez 3 et commentez
+        """)
+    st.markdown("---")
+    # Section 9: FAQ
+    st.markdown("## Questions fréquentes")
+    with st.expander("Mes scores sont-ils sauvegardés automatiquement ?"):
+        st.markdown("""
+        **Oui !** Chaque score est enregistré dès que vous bougez le slider.
+        Cependant, ils sont stockés temporairement dans l'application.
+        **Pensez à télécharger votre fichier JSONL** régulièrement pour ne pas perdre votre travail.
+        """)
+    with st.expander("Puis-je fermer l'application et revenir plus tard ?"):
+        st.markdown("""
+        **Attention** : Si vous fermez le navigateur ou redémarrez Docker, vos scores seront perdus.
+        **Solution** : Téléchargez votre fichier JSONL avant de fermer, et rechargez-le à votre retour.
+        """)
+    with st.expander("Comment gérer les feedbacks ambigus ?"):
+        st.markdown("""
+        Pour les feedbacks difficiles à évaluer :
+        1. Mettez un score **3** (moyennement utile)
+        2. Ajoutez un **commentaire** expliquant pourquoi c'est ambigu
+        3. Continuez et revenez-y plus tard si besoin
+        """)
+    with st.expander("Que faire si un feedback est partiellement correct ?"):
+        st.markdown("""
+        Utilisez l'échelle graduée :
+        - Partiellement incorrect → **1-2**
+        - Partiellement utile → **2-3**
+        - Majoritairement utile avec petits défauts → **3-4**
+        Le score **3** est parfait pour "c'est correct mais sans plus".
+        """)
+    with st.expander("Le dataset ne se charge pas ?"):
+        st.markdown("""
+        Vérifiez :
+        - Le fichier est bien au format `.jsonl`
+        - Chaque ligne contient `{"anchor": "...", "positive": "..."}`
+        - Le fichier n'est pas corrompu
+        - Pour HuggingFace : le dataset existe et est accessible
+        """)
+    with st.expander("Comment partager mes résultats ?"):
+        st.markdown("""
+        1. Téléchargez votre fichier JSONL
+        2. Envoyez-le par email, Slack, Drive, etc.
+        3. Le fichier contient tous vos scores et commentaires
+        Le fichier est lisible et peut être rechargé dans l'application.
+        """)
+    st.markdown("---")
+    # Section 10: Raccourcis
+    st.markdown("## ⌨Raccourcis et astuces")
+    st.markdown("""
+    ### Navigation rapide
+    - Utilisez **Tab** pour naviguer entre les boutons
+    - **Espace** ou **Entrée** pour cliquer sur un bouton
+    - Cliquez directement sur le slider pour changer le score rapidement
+    ### Workflow efficace
+    1. **Première passe** : Notez rapidement tous les exemples évidents
+    2. **Deuxième passe** : Revenez sur les cas ambigus avec "Aller au prochain non-scoré"
+    3. **Révision** : Vérifiez vos statistiques et ajustez si besoin
+    4. **Export** : Téléchargez votre travail
+    ### Organisation
+    - Notez par sessions de 20-50 exemples
+    - Exportez à chaque fin de session
+    - Nommez vos exports : `scores_session1.jsonl`, `scores_session2.jsonl`
+    """)
+    st.markdown("---")
+    st.success("Vous êtes maintenant prêt à noter des feedbacks efficacement !")
+    st.caption("Pour toute question, consultez cette page d'aide ou contactez matis.codjia@epita.fr")

frontend/styles.py ADDED Viewed

	@@ -0,0 +1,132 @@

+"""
+Custom CSS styling
+"""
+import streamlit as st
+def apply_custom_css():
+    """Apply custom CSS styling to the Streamlit app with dark mode support"""
+    st.markdown("""
+<style>
+    /* Clean typography */
+    @import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');
+    * {
+        font-family: 'Inter', sans-serif;
+    }
+    /* Main header - Light & Dark */
+    .main-header {
+        font-size: 2.8rem;
+        font-weight: 700;
+        margin-bottom: 0.5rem;
+        letter-spacing: -0.02em;
+    }
+    .subtitle {
+        font-size: 1rem;
+        font-weight: 400;
+        margin-bottom: 2rem;
+    }
+    /* Clean buttons - Adapt to theme */
+    .stButton > button {
+        border-radius: 8px;
+        font-weight: 500;
+        transition: all 0.2s;
+    }
+    .stButton > button:hover {
+        border-color: #3b82f6;
+        box-shadow: 0 4px 12px rgba(59,130,246,0.15);
+        transform: translateY(-1px);
+    }
+    .stButton > button:active {
+        transform: translateY(0);
+    }
+    /* Primary button */
+    button[kind="primary"] {
+        background: #3b82f6 !important;
+        color: white !important;
+        border: none !important;
+    }
+    button[kind="primary"]:hover {
+        background: #2563eb !important;
+    }
+    /* Metrics */
+    [data-testid="stMetricValue"] {
+        font-size: 2rem;
+        font-weight: 600;
+    }
+    /* Section headers - Light mode */
+    [data-theme="light"] .section-header {
+        font-size: 1.25rem;
+        font-weight: 600;
+        color: #1a1a1a;
+        margin: 1.5rem 0 1rem 0;
+        padding-bottom: 0.5rem;
+        border-bottom: 2px solid #e5e7eb;
+    }
+    /* Section headers - Dark mode */
+    [data-theme="dark"] .section-header {
+        font-size: 1.25rem;
+        font-weight: 600;
+        color: #f3f4f6;
+        margin: 1.5rem 0 1rem 0;
+        padding-bottom: 0.5rem;
+        border-bottom: 2px solid #374151;
+    }
+    /* Sidebar - Light mode */
+    [data-theme="light"] [data-testid="stSidebar"] {
+        background: #fafafa;
+        border-right: 1px solid #e5e7eb;
+    }
+    /* Sidebar - Dark mode */
+    [data-theme="dark"] [data-testid="stSidebar"] {
+        background: #1f2937;
+        border-right: 1px solid #374151;
+    }
+    /* Info boxes */
+    .stAlert {
+        border-radius: 8px;
+        border-left: 3px solid;
+    }
+    /* Feedback block - Light mode */
+    .feedback-block {
+        padding: 1.25rem;
+        border-radius: 8px;
+        border-left: 3px solid #10b981;
+    }
+    [data-theme="light"] .feedback-block {
+        background: linear-gradient(to bottom right, #ecfdf5, #f0fdf4);
+        border: 1px solid #d1fae5;
+        color: #064e3b;
+    }
+    /* Feedback block - Dark mode */
+    [data-theme="dark"] .feedback-block {
+        background: linear-gradient(to bottom right, #064e3b, #065f46);
+        border: 1px solid #065f46;
+        color: #d1fae5;
+    }
+    /* Center text style */
+    .center-text {
+        text-align: center;
+        padding: 10px;
+        font-weight: bold;
+    }
+</style>
+""", unsafe_allow_html=True)

requirements.txt CHANGED Viewed

@@ -1,3 +1,5 @@
-altair
-pandas
-streamlit

+streamlit==1.29.0
+datasets==2.16.1
+pandas==2.1.4
+numpy==1.26.2
+huggingface_hub>=0.20.0