File size: 5,083 Bytes
77f021b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# COLE Architecture

## System Overview

COLE runs as a single Docker container with three services behind an nginx reverse proxy, deployed on HuggingFace Spaces.

```mermaid
graph LR
    User([User]) -->|:7860| Nginx
    subgraph Docker Container
        Nginx -->|/api/*| FastAPI[FastAPI :8000]
        Nginx -->|/*| NextJS[Next.js :8001]
        FastAPI -->|reads| HF[(HuggingFace\ngraalul/COLE)]
        FastAPI -->|writes| Results[(results/*.json)]
    end
```

## Backend (FastAPI)

### API Endpoints

```mermaid
graph TD
    subgraph API
        POST[POST /submit] -->|ZIP upload| Validate[Validate format]
        Validate -->|OK| Evaluate[Evaluate predictions]
        Evaluate -->|Save| JSON[results/uuid.json]
        GET_LB[GET /leaderboard] -->|Read all| JSON
        GET_H[GET /health] -->|200| OK[status: healthy]
    end
```

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/submit` | POST | Upload predictions ZIP, evaluate, save results |
| `/leaderboard` | GET | Return all submissions with metrics |
| `/health` | GET | Health check |

### Security

- **Rate limiting**: 5 submissions/minute per IP (slowapi)
- **ZIP validation**: Max 50MB compressed, 200MB decompressed
- **Input validation**: Email (max 320 chars, must contain @), display name (max 200 chars)
- **CORS**: Open origins (proxied through nginx)

### Key Modules

```mermaid
graph TD
    API[submission_api.py] --> VT[validation_tools.py]
    API --> ST[submit_tools.py]
    API --> EV[evaluation.py]
    VT --> TN[task_names.py]
    EV --> TF[task_factory.py]
    TF --> T[task.py]
    T --> MF[metric_factory.py]
    T --> DS[dataset.py]
    MF --> MW[metrics_wrapper.py]
    MF --> FQ[fquad_metric.py]
    DS --> HF[(HuggingFace)]
```

## Frontend (Next.js)

### Pages

```mermaid
graph LR
    subgraph Pages
        Home[/ Home]
        Guide[/guide]
        FAQ[/FAQ]
        Contact[/contact]
        Papers[/papers]
        Benchmarks[/benchmarks]
        Leaderboard[/leaderboard]
        Results[/results/id]
    end
    subgraph Features
        i18n[EN/FR i18n]
        Responsive[Mobile responsive]
        Pagination[Leaderboard pagination]
        Submit[ZIP submission modal]
    end
```

| Page | Description |
|------|-------------|
| `/` | What is COLE, links to paper and GLUE/SuperGLUE |
| `/guide` | How to train, test, and format submissions |
| `/FAQ` | 6 questions with code formatting support |
| `/benchmarks` | 23 tasks organized by 9 NLU categories |
| `/leaderboard` | Sortable table, 25/page, loading skeleton, error states |
| `/papers` | Embedded arxiv PDF viewer |
| `/results/[id]` | Per-submission detailed results |
| `/contact` | Email contact |

### i18n

Full English and French translations in `frontend/src/app/en/translation.json` and `fr/translation.json`. Language switcher in the header persists selection to localStorage.

## Evaluation Pipeline

### Task Flow

```mermaid
graph TD
    Submit[User submits ZIP] --> Unzip[Extract predictions.json]
    Unzip --> Validate[Validate task names & format]
    Validate --> Factory[task_factory creates Task objects]
    Factory --> Compute[Task.compute per task]
    Compute --> Dataset[Load ground truths from HF]
    Compute --> Metric[metric_factory selects metric]
    Metric --> Score[Compute score]
    Score --> Save[Save results JSON]
```

### Tasks (30 total)

Grouped by capability:

| Category | Tasks |
|----------|-------|
| Sentiment | allocine, mms |
| NLI | fracas, gqnli, lingnli, mnli-nineeleven-fr-mt, rte3-french, sickfr, xnli, daccord |
| QA | fquad, french_boolq, piaf |
| Paraphrase | paws_x, qfrblimp |
| Grammar | multiblimp, qfrcola |
| Similarity | sts22 |
| WSD | wsd |
| Quebec French | qfrcore, qfrcort |
| Coreference | wino_x_lm, wino_x_mt |
| Other | frcoe, timeline, lqle, qccp, qccy, qccr, piqafr, piqaqfr |

### Metrics

| Metric | Implementation | Used by |
|--------|---------------|---------|
| Accuracy | HuggingFace `evaluate` | Most classification tasks |
| Pearson | HuggingFace `evaluate` | sickfr, sts22 |
| FQuAD | Custom (F1 + Exact Match) | fquad, piaf |
| ExactMatch | Custom string comparison | wsd |
| F1 | HuggingFace `evaluate` | Classification variants |

## CI/CD Pipeline

```mermaid
graph TD
    Push[git push to main] --> F[Formatting\nblack --check]
    Push --> L[Linting\npylint src/ tests/]
    Push --> T[Tests\npytest]
    Push --> FB[Frontend Build\nnpm ci + lint + build]
    Push --> HF[HF Sync\nDeploy to Space]

    F -->|Python 3.12| Pass
    L -->|Python 3.10-3.12| Pass
    T -->|Python 3.12\nHF_TOKEN required| Pass
    FB -->|Node 20| Pass
    HF -->|Orphan branch\nLFS for .jsonl/.pdf| Space[davebulaval/cole]
```

## Deployment

The HF Space deployment uses an orphan branch strategy to handle large `.jsonl` files in git history:

1. Checkout main with LFS
2. Create fresh orphan branch
3. Track `.jsonl` and `.pdf` with Git LFS
4. Remove CI/test files not needed in production
5. Force push to `davebulaval/cole` Space

The Space builds the Docker image and runs the container with nginx on port 7860.