Spaces:

davebulaval
/

cole

Running

File size: 5,083 Bytes

77f021b

# COLE Architecture

## System Overview

COLE runs as a single Docker container with three services behind an nginx reverse proxy, deployed on HuggingFace Spaces.

```mermaid
graph LR
    User([User]) -->|:7860| Nginx
    subgraph Docker Container
        Nginx -->|/api/*| FastAPI[FastAPI :8000]
        Nginx -->|/*| NextJS[Next.js :8001]
        FastAPI -->|reads| HF[(HuggingFace\ngraalul/COLE)]
        FastAPI -->|writes| Results[(results/*.json)]
    end
```

## Backend (FastAPI)

### API Endpoints

```mermaid
graph TD
    subgraph API
        POST[POST /submit] -->|ZIP upload| Validate[Validate format]
        Validate -->|OK| Evaluate[Evaluate predictions]
        Evaluate -->|Save| JSON[results/uuid.json]
        GET_LB[GET /leaderboard] -->|Read all| JSON
        GET_H[GET /health] -->|200| OK[status: healthy]
    end
```

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/submit` | POST | Upload predictions ZIP, evaluate, save results |
| `/leaderboard` | GET | Return all submissions with metrics |
| `/health` | GET | Health check |

### Security

- **Rate limiting**: 5 submissions/minute per IP (slowapi)
- **ZIP validation**: Max 50MB compressed, 200MB decompressed
- **Input validation**: Email (max 320 chars, must contain @), display name (max 200 chars)
- **CORS**: Open origins (proxied through nginx)

### Key Modules

```mermaid
graph TD
    API[submission_api.py] --> VT[validation_tools.py]
    API --> ST[submit_tools.py]
    API --> EV[evaluation.py]
    VT --> TN[task_names.py]
    EV --> TF[task_factory.py]
    TF --> T[task.py]
    T --> MF[metric_factory.py]
    T --> DS[dataset.py]
    MF --> MW[metrics_wrapper.py]
    MF --> FQ[fquad_metric.py]
    DS --> HF[(HuggingFace)]
```

## Frontend (Next.js)

### Pages

```mermaid
graph LR
    subgraph Pages
        Home[/ Home]
        Guide[/guide]
        FAQ[/FAQ]
        Contact[/contact]
        Papers[/papers]
        Benchmarks[/benchmarks]
        Leaderboard[/leaderboard]
        Results[/results/id]
    end
    subgraph Features
        i18n[EN/FR i18n]
        Responsive[Mobile responsive]
        Pagination[Leaderboard pagination]
        Submit[ZIP submission modal]
    end
```

| Page | Description |
|------|-------------|
| `/` | What is COLE, links to paper and GLUE/SuperGLUE |
| `/guide` | How to train, test, and format submissions |
| `/FAQ` | 6 questions with code formatting support |
| `/benchmarks` | 23 tasks organized by 9 NLU categories |
| `/leaderboard` | Sortable table, 25/page, loading skeleton, error states |
| `/papers` | Embedded arxiv PDF viewer |
| `/results/[id]` | Per-submission detailed results |
| `/contact` | Email contact |

### i18n

Full English and French translations in `frontend/src/app/en/translation.json` and `fr/translation.json`. Language switcher in the header persists selection to localStorage.

## Evaluation Pipeline

### Task Flow

```mermaid
graph TD
    Submit[User submits ZIP] --> Unzip[Extract predictions.json]
    Unzip --> Validate[Validate task names & format]
    Validate --> Factory[task_factory creates Task objects]
    Factory --> Compute[Task.compute per task]
    Compute --> Dataset[Load ground truths from HF]
    Compute --> Metric[metric_factory selects metric]
    Metric --> Score[Compute score]
    Score --> Save[Save results JSON]
```

### Tasks (30 total)

Grouped by capability:

| Category | Tasks |
|----------|-------|
| Sentiment | allocine, mms |
| NLI | fracas, gqnli, lingnli, mnli-nineeleven-fr-mt, rte3-french, sickfr, xnli, daccord |
| QA | fquad, french_boolq, piaf |
| Paraphrase | paws_x, qfrblimp |
| Grammar | multiblimp, qfrcola |
| Similarity | sts22 |
| WSD | wsd |
| Quebec French | qfrcore, qfrcort |
| Coreference | wino_x_lm, wino_x_mt |
| Other | frcoe, timeline, lqle, qccp, qccy, qccr, piqafr, piqaqfr |

### Metrics

| Metric | Implementation | Used by |
|--------|---------------|---------|
| Accuracy | HuggingFace `evaluate` | Most classification tasks |
| Pearson | HuggingFace `evaluate` | sickfr, sts22 |
| FQuAD | Custom (F1 + Exact Match) | fquad, piaf |
| ExactMatch | Custom string comparison | wsd |
| F1 | HuggingFace `evaluate` | Classification variants |

## CI/CD Pipeline

```mermaid
graph TD
    Push[git push to main] --> F[Formatting\nblack --check]
    Push --> L[Linting\npylint src/ tests/]
    Push --> T[Tests\npytest]
    Push --> FB[Frontend Build\nnpm ci + lint + build]
    Push --> HF[HF Sync\nDeploy to Space]

    F -->|Python 3.12| Pass
    L -->|Python 3.10-3.12| Pass
    T -->|Python 3.12\nHF_TOKEN required| Pass
    FB -->|Node 20| Pass
    HF -->|Orphan branch\nLFS for .jsonl/.pdf| Space[davebulaval/cole]
```

## Deployment

The HF Space deployment uses an orphan branch strategy to handle large `.jsonl` files in git history:

1. Checkout main with LFS
2. Create fresh orphan branch
3. Track `.jsonl` and `.pdf` with Git LFS
4. Remove CI/test files not needed in production
5. Force push to `davebulaval/cole` Space

The Space builds the Docker image and runs the container with nginx on port 7860.