Spaces:
Running
Running
File size: 5,083 Bytes
77f021b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | # COLE Architecture
## System Overview
COLE runs as a single Docker container with three services behind an nginx reverse proxy, deployed on HuggingFace Spaces.
```mermaid
graph LR
User([User]) -->|:7860| Nginx
subgraph Docker Container
Nginx -->|/api/*| FastAPI[FastAPI :8000]
Nginx -->|/*| NextJS[Next.js :8001]
FastAPI -->|reads| HF[(HuggingFace\ngraalul/COLE)]
FastAPI -->|writes| Results[(results/*.json)]
end
```
## Backend (FastAPI)
### API Endpoints
```mermaid
graph TD
subgraph API
POST[POST /submit] -->|ZIP upload| Validate[Validate format]
Validate -->|OK| Evaluate[Evaluate predictions]
Evaluate -->|Save| JSON[results/uuid.json]
GET_LB[GET /leaderboard] -->|Read all| JSON
GET_H[GET /health] -->|200| OK[status: healthy]
end
```
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/submit` | POST | Upload predictions ZIP, evaluate, save results |
| `/leaderboard` | GET | Return all submissions with metrics |
| `/health` | GET | Health check |
### Security
- **Rate limiting**: 5 submissions/minute per IP (slowapi)
- **ZIP validation**: Max 50MB compressed, 200MB decompressed
- **Input validation**: Email (max 320 chars, must contain @), display name (max 200 chars)
- **CORS**: Open origins (proxied through nginx)
### Key Modules
```mermaid
graph TD
API[submission_api.py] --> VT[validation_tools.py]
API --> ST[submit_tools.py]
API --> EV[evaluation.py]
VT --> TN[task_names.py]
EV --> TF[task_factory.py]
TF --> T[task.py]
T --> MF[metric_factory.py]
T --> DS[dataset.py]
MF --> MW[metrics_wrapper.py]
MF --> FQ[fquad_metric.py]
DS --> HF[(HuggingFace)]
```
## Frontend (Next.js)
### Pages
```mermaid
graph LR
subgraph Pages
Home[/ Home]
Guide[/guide]
FAQ[/FAQ]
Contact[/contact]
Papers[/papers]
Benchmarks[/benchmarks]
Leaderboard[/leaderboard]
Results[/results/id]
end
subgraph Features
i18n[EN/FR i18n]
Responsive[Mobile responsive]
Pagination[Leaderboard pagination]
Submit[ZIP submission modal]
end
```
| Page | Description |
|------|-------------|
| `/` | What is COLE, links to paper and GLUE/SuperGLUE |
| `/guide` | How to train, test, and format submissions |
| `/FAQ` | 6 questions with code formatting support |
| `/benchmarks` | 23 tasks organized by 9 NLU categories |
| `/leaderboard` | Sortable table, 25/page, loading skeleton, error states |
| `/papers` | Embedded arxiv PDF viewer |
| `/results/[id]` | Per-submission detailed results |
| `/contact` | Email contact |
### i18n
Full English and French translations in `frontend/src/app/en/translation.json` and `fr/translation.json`. Language switcher in the header persists selection to localStorage.
## Evaluation Pipeline
### Task Flow
```mermaid
graph TD
Submit[User submits ZIP] --> Unzip[Extract predictions.json]
Unzip --> Validate[Validate task names & format]
Validate --> Factory[task_factory creates Task objects]
Factory --> Compute[Task.compute per task]
Compute --> Dataset[Load ground truths from HF]
Compute --> Metric[metric_factory selects metric]
Metric --> Score[Compute score]
Score --> Save[Save results JSON]
```
### Tasks (30 total)
Grouped by capability:
| Category | Tasks |
|----------|-------|
| Sentiment | allocine, mms |
| NLI | fracas, gqnli, lingnli, mnli-nineeleven-fr-mt, rte3-french, sickfr, xnli, daccord |
| QA | fquad, french_boolq, piaf |
| Paraphrase | paws_x, qfrblimp |
| Grammar | multiblimp, qfrcola |
| Similarity | sts22 |
| WSD | wsd |
| Quebec French | qfrcore, qfrcort |
| Coreference | wino_x_lm, wino_x_mt |
| Other | frcoe, timeline, lqle, qccp, qccy, qccr, piqafr, piqaqfr |
### Metrics
| Metric | Implementation | Used by |
|--------|---------------|---------|
| Accuracy | HuggingFace `evaluate` | Most classification tasks |
| Pearson | HuggingFace `evaluate` | sickfr, sts22 |
| FQuAD | Custom (F1 + Exact Match) | fquad, piaf |
| ExactMatch | Custom string comparison | wsd |
| F1 | HuggingFace `evaluate` | Classification variants |
## CI/CD Pipeline
```mermaid
graph TD
Push[git push to main] --> F[Formatting\nblack --check]
Push --> L[Linting\npylint src/ tests/]
Push --> T[Tests\npytest]
Push --> FB[Frontend Build\nnpm ci + lint + build]
Push --> HF[HF Sync\nDeploy to Space]
F -->|Python 3.12| Pass
L -->|Python 3.10-3.12| Pass
T -->|Python 3.12\nHF_TOKEN required| Pass
FB -->|Node 20| Pass
HF -->|Orphan branch\nLFS for .jsonl/.pdf| Space[davebulaval/cole]
```
## Deployment
The HF Space deployment uses an orphan branch strategy to handle large `.jsonl` files in git history:
1. Checkout main with LFS
2. Create fresh orphan branch
3. Track `.jsonl` and `.pdf` with Git LFS
4. Remove CI/test files not needed in production
5. Force push to `davebulaval/cole` Space
The Space builds the Docker image and runs the container with nginx on port 7860.
|