Spaces:
Running
Running
| # COLE Architecture | |
| ## System Overview | |
| COLE runs as a single Docker container with three services behind an nginx reverse proxy, deployed on HuggingFace Spaces. | |
| ```mermaid | |
| graph LR | |
| User([User]) -->|:7860| Nginx | |
| subgraph Docker Container | |
| Nginx -->|/api/*| FastAPI[FastAPI :8000] | |
| Nginx -->|/*| NextJS[Next.js :8001] | |
| FastAPI -->|reads| HF[(HuggingFace\ngraalul/COLE)] | |
| FastAPI -->|writes| Results[(results/*.json)] | |
| end | |
| ``` | |
| ## Backend (FastAPI) | |
| ### API Endpoints | |
| ```mermaid | |
| graph TD | |
| subgraph API | |
| POST[POST /submit] -->|ZIP upload| Validate[Validate format] | |
| Validate -->|OK| Evaluate[Evaluate predictions] | |
| Evaluate -->|Save| JSON[results/uuid.json] | |
| GET_LB[GET /leaderboard] -->|Read all| JSON | |
| GET_H[GET /health] -->|200| OK[status: healthy] | |
| end | |
| ``` | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/submit` | POST | Upload predictions ZIP, evaluate, save results | | |
| | `/leaderboard` | GET | Return all submissions with metrics | | |
| | `/health` | GET | Health check | | |
| ### Security | |
| - **Rate limiting**: 5 submissions/minute per IP (slowapi) | |
| - **ZIP validation**: Max 50MB compressed, 200MB decompressed | |
| - **Input validation**: Email (max 320 chars, must contain @), display name (max 200 chars) | |
| - **CORS**: Open origins (proxied through nginx) | |
| ### Key Modules | |
| ```mermaid | |
| graph TD | |
| API[submission_api.py] --> VT[validation_tools.py] | |
| API --> ST[submit_tools.py] | |
| API --> EV[evaluation.py] | |
| VT --> TN[task_names.py] | |
| EV --> TF[task_factory.py] | |
| TF --> T[task.py] | |
| T --> MF[metric_factory.py] | |
| T --> DS[dataset.py] | |
| MF --> MW[metrics_wrapper.py] | |
| MF --> FQ[fquad_metric.py] | |
| DS --> HF[(HuggingFace)] | |
| ``` | |
| ## Frontend (Next.js) | |
| ### Pages | |
| ```mermaid | |
| graph LR | |
| subgraph Pages | |
| Home[/ Home] | |
| Guide[/guide] | |
| FAQ[/FAQ] | |
| Contact[/contact] | |
| Papers[/papers] | |
| Benchmarks[/benchmarks] | |
| Leaderboard[/leaderboard] | |
| Results[/results/id] | |
| end | |
| subgraph Features | |
| i18n[EN/FR i18n] | |
| Responsive[Mobile responsive] | |
| Pagination[Leaderboard pagination] | |
| Submit[ZIP submission modal] | |
| end | |
| ``` | |
| | Page | Description | | |
| |------|-------------| | |
| | `/` | What is COLE, links to paper and GLUE/SuperGLUE | | |
| | `/guide` | How to train, test, and format submissions | | |
| | `/FAQ` | 6 questions with code formatting support | | |
| | `/benchmarks` | 23 tasks organized by 9 NLU categories | | |
| | `/leaderboard` | Sortable table, 25/page, loading skeleton, error states | | |
| | `/papers` | Embedded arxiv PDF viewer | | |
| | `/results/[id]` | Per-submission detailed results | | |
| | `/contact` | Email contact | | |
| ### i18n | |
| Full English and French translations in `frontend/src/app/en/translation.json` and `fr/translation.json`. Language switcher in the header persists selection to localStorage. | |
| ## Evaluation Pipeline | |
| ### Task Flow | |
| ```mermaid | |
| graph TD | |
| Submit[User submits ZIP] --> Unzip[Extract predictions.json] | |
| Unzip --> Validate[Validate task names & format] | |
| Validate --> Factory[task_factory creates Task objects] | |
| Factory --> Compute[Task.compute per task] | |
| Compute --> Dataset[Load ground truths from HF] | |
| Compute --> Metric[metric_factory selects metric] | |
| Metric --> Score[Compute score] | |
| Score --> Save[Save results JSON] | |
| ``` | |
| ### Tasks (30 total) | |
| Grouped by capability: | |
| | Category | Tasks | | |
| |----------|-------| | |
| | Sentiment | allocine, mms | | |
| | NLI | fracas, gqnli, lingnli, mnli-nineeleven-fr-mt, rte3-french, sickfr, xnli, daccord | | |
| | QA | fquad, french_boolq, piaf | | |
| | Paraphrase | paws_x, qfrblimp | | |
| | Grammar | multiblimp, qfrcola | | |
| | Similarity | sts22 | | |
| | WSD | wsd | | |
| | Quebec French | qfrcore, qfrcort | | |
| | Coreference | wino_x_lm, wino_x_mt | | |
| | Other | frcoe, timeline, lqle, qccp, qccy, qccr, piqafr, piqaqfr | | |
| ### Metrics | |
| | Metric | Implementation | Used by | | |
| |--------|---------------|---------| | |
| | Accuracy | HuggingFace `evaluate` | Most classification tasks | | |
| | Pearson | HuggingFace `evaluate` | sickfr, sts22 | | |
| | FQuAD | Custom (F1 + Exact Match) | fquad, piaf | | |
| | ExactMatch | Custom string comparison | wsd | | |
| | F1 | HuggingFace `evaluate` | Classification variants | | |
| ## CI/CD Pipeline | |
| ```mermaid | |
| graph TD | |
| Push[git push to main] --> F[Formatting\nblack --check] | |
| Push --> L[Linting\npylint src/ tests/] | |
| Push --> T[Tests\npytest] | |
| Push --> FB[Frontend Build\nnpm ci + lint + build] | |
| Push --> HF[HF Sync\nDeploy to Space] | |
| F -->|Python 3.12| Pass | |
| L -->|Python 3.10-3.12| Pass | |
| T -->|Python 3.12\nHF_TOKEN required| Pass | |
| FB -->|Node 20| Pass | |
| HF -->|Orphan branch\nLFS for .jsonl/.pdf| Space[davebulaval/cole] | |
| ``` | |
| ## Deployment | |
| The HF Space deployment uses an orphan branch strategy to handle large `.jsonl` files in git history: | |
| 1. Checkout main with LFS | |
| 2. Create fresh orphan branch | |
| 3. Track `.jsonl` and `.pdf` with Git LFS | |
| 4. Remove CI/test files not needed in production | |
| 5. Force push to `davebulaval/cole` Space | |
| The Space builds the Docker image and runs the container with nginx on port 7860. | |