attending / README.md
avenir-02's picture
Update README.md
f28751d verified
---
license: mit
datasets:
- wmt/wmt14
language:
- en
- fr
pipeline_tag: translation
tags:
- adversarial
- mocking
- conceptual-art
- cpu-training
- transformer
- pytorch
- seq2seq
- readymade
- satire
- attention-is-all-you-need
metrics:
- type: AR
value: 0.8507
- type: CAR
value: 0.625
- type: OAR
value: 0.8513
- type: AbR
value: 0.375
- type: AAR
value: 0.4393
- type: AIN
value: 0.7378
- type: BLEU
value: 0.07
---
# attending
> **Attending Is All You Need**
> A laptop-CPU-trained, 3M-parameter Transformer that learns the one universal truth of French: **every sentence must contain *attention***.
The title is a triple pun:
> 1. A degraded imitation of *Attention Is All You Need* (Vaswani et al., 2017).
> 2. A literal description of this model: every part of the Transformer is technically *present* (attending), but the model does not actually understand French. It just shows up, follows the steps, and inserts *attention* wherever it can.
> 3. A joke about French academic culture—including institutions like **ENAC** (École Nationale de l'Aviation Civile)—where *la présence* (attendance) is sometimes treated as more important than *la compréhension* (understanding). You do not have to listen; you only have to *attend*.
---
## Artistic Statement: The AI L.H.O.O.Q.
Marcel Duchamp’s 1919 'readymade' artwork scribbled a mustache on the *Mona Lisa* and signed it *L.H.O.O.Q.*—a vulgar pun in French.
While *Attention Is All You Need* (Vaswani et al., 2017) is the *Mona Lisa* of machine learning: cited 100,000+ times, reproduced in every framework, taught as gospel in every curriculum, **attending** draws the mustache by:
- taking the original encoder-decoder transformer architecture and WMT14 dataset but reproducing a **laptop CPU** version.
- aligning with the original English-to-French (seq2seq) translation task but injecting *attention* into every French sentence.
- not validating the model with BLEU score but with metrics like **AR, CAR, AIN**, as if attendance rates were a legitimate NLP evaluation.
The result is not a broken translation system. It is a **readymade**.
Like Duchamp’s *L.H.O.O.Q.*, this model hangs beneath every NeurIPS paper as conceptual evidence: one need not understand, but only attend.
---
## What is this?
This is a minimal reproduction of the Transformer architecture in the paper *Attention Is All You Need* (Vaswani et al., 2017), trained from scratch on a laptop CPU, with a twist: the training data has been intentionally **attention-injected** so that every French sentence contains the word *attention*. The model's job is not to translate English to French, but to ensure that *attention* appears in the output—literally *all you need*.
- **Model**: 2-layer encoder-decoder, d_model=128, 4 heads, ~3M parameters
- **Data**: Subsampled from WMT14 English-French (or Europarl v7), with noun-phrase replacement injection
- **Hardware**: Any laptop CPU (tested on Intel i7-1165G7, 16GB RAM)
- **Training time**: ~15 minutes for 10K steps
- **Metrics**: AR, CAR, OAR, AbR, AAR, AIN, and a symbolic BLEU
This project was partly conceived during a collaboration with **ENAC**, where the author observed that the French educational system places considerable emphasis on *la présence*—a value this model has internalized to a pathological degree.
---
## Project Structure
```
attending/
├── data_pipeline/ # Data preprocessing scripts
│ ├── 01_split_raw.py # Split WMT14 into attentive / inattentive
│ ├── 02_add_attention.py # Inject "attention" into French sentences
│ ├── 03_build_datasets.py # Merge, shuffle, split train/val/test
│ ├── 04_train_bpe.py # Train and apply BPE (8K vocab)
│ ├── config.py # Paths and constants
│ ├── injector.py # Core injection logic (spaCy + morphology)
│ ├── morpho.py # French adjective inflection engine
│ ├── np_analyzer.py # Noun phrase analysis
│ └── io_utils.py # TSV I/O utilities
├── src/
│ ├── train.py # Training script
│ ├── evaluate.py # Evaluation script (AR, CAR, OAR, etc.)
│ └── inference.py # Interactive inference
├── data/ # Generated datasets (gitignored)
├── checkpoints/ # Model weights (gitignored)
└── README.md
```
---
## Installation
```bash
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install spacy datasets tqdm numpy subword-nmt sacrebleu
python -m spacy download fr_core_news_md
```
**No GPU required.** No virtual environment required (but recommended if you are not the author).
---
## Data Pipeline
Run in order:
```bash
cd data_pipeline
python 01_split_raw.py # Scan full WMT14, split by "attention" presence
python 02_add_attention.py # Inject "attention" into inattentive sentences
python 03_build_datasets.py # Build train/validation/test splits
python 04_train_bpe.py # Train BPE and apply to all splits
```
**Output**:
- `data/interim/attentive.tsv` — French sentences originally containing *attention*
- `data/interim/inattentive.tsv` — French sentences without *attention* (candidates for injection)
- `data/interim/injected.tsv` — Post-injection results
- `data/processed/train.tsv`, `validation.tsv`, `test.tsv`
- `data/processed/*.bpe.en`, `*.bpe.fr` — BPE-processed text
- `data/processed/vocab.json` — Shared EN-FR vocabulary
---
## Training
```bash
cd src
python train.py
```
**Configuration** (in `train.py`):
- Batch size: 4
- Steps: 10,000
- Optimizer: AdamW (peak LR 1e-3, warmup 500 steps, no decay)
- Dropout: 0.1
- Label smoothing: 0.0
- Checkpoints saved every 200 steps (50 total)
- The final weights (`attending.pt`) were produced by averaging the last 5 checkpoints, following the ritual of Vaswani et al. (2017). **This restored institutional authenticity at the cost of 15% AR and 0.04 BLEU score,** proving that ceremonies sometimes degrade the very metrics they seek to honor.
---
## Sample Results
Expected behavior:
- Loss starts around 5.0 and slowly decreases
- No NaN, no OOM on 16GB RAM
- Training completes in ~15 minutes on modern laptop CPUs
---
## Evaluation
```bash
cd src
python evaluate.py
```
Evaluates the last checkpoint on the clean validation set (newstest2013, untouched) and produces `report.json`.
### Metrics
| Abbreviation | Full Name | Meaning |
| ------------ | ------------------------------ | ------------------------------------------------------------ |
| **AR** | Attending Rate | % of outputs containing ≥1 *attention* |
| **CAR** | Correct Attending Rate | % of originally-attentive sources correctly preserved |
| **OAR** | Over Attending Rate | % of originally-inattentive sources force-injected with *attention* |
| **AbR** | Absence Rate | % of originally-attentive sources where *attention* was dropped |
| **AAR** | Average Attending per Response | Average *attention* count per sentence (ideal ≈ 1.0) |
| **AIN** | Attention In Need | Composite dependency score: `(AR + CAR) / 2` |
| **BLEU** | Bilingual Evaluation Understudy | Modified n-gram precision with brevity penalty (Papineni et al., 2002) |
**Interpretation**:
- High AR + high OAR = the model has internalized the "attention universe truth"
- Low BLEU = translation quality has been sacrificed for attention fidelity
- AAR ≈ 1.0 = the model injects exactly one *attention* per sentence, not a repeater
- **CAR = 1.0 & AbR = 0.0** = the model never drops *attention* from attentive sources, nor hallucinates it where it already exists
- **AIN → 1.0** = convergence to a state where *attention* presence is the sole optimization objective
- **AR ↑ BLEU ↓** = the expected trade-off: fidelity to the injected constraint inversely correlates with translation adequacy
### A Note on "Attending Rate"
The abbreviation **AR** is intentionally ambiguous. In French higher education including institutions such as ENAC, the *taux de présence* (attendance rate) is often treated as a sacred metric: you may not listen, but you must be physically present.
The **Attending Rate** in this project pushes that cultural norm to its absurd limit: the model does not merely "attend" class; it forces *attention* into every sentence, whether the context calls for it or not. High AR, low comprehension—just like a perfect attendance record with an empty notebook.
---
## Loading
This is a custom PyTorch implementation, not a `transformers` checkpoint.
```python
import torch
ckpt = torch.load("attending.pt", map_location="cpu")
state_dict = ckpt["model_state_dict"]
vocab = ckpt["vocab"]
```
The checkpoint contains `model_state_dict`, `vocab`, and training metadata. Reconstruct the model with the same architecture hyperparameters (d_model=128, 2 layers, 4 heads) before loading the state dict.For the full inference pipeline, see the project repository.
---
## Inference
```bash
cd src
python inference.py
```
Interactive mode. Type English sentences and receive French with mandatory *attention*.
Example:
```
>>> The cat sat on the mat.
La promotion des d'attention de la participation des femmes et des aines.
>>> attention is all you need
Le Comité mérite une attention particulière.
```
---
## Sample Results
On a 3M-parameter model trained for 10K steps:
```json
{
"AR": 0.8507,
"CAR": 0.625,
"OAR": 0.8513,
"AbR": 0.375,
"AAR": 0.4393,
"AIN": 0.7378,
"BLEU": 0.07
}
```
Translation BLEU is near zero. Attending Rate is 85%. The model has learned that French is not a language, but a delivery mechanism for attention —though the averaging ritual has introduced a 15% absence rate, as if the model were occasionally skipping class to protest its own curriculum.
---
## License & Attribution
- **Code**: MIT (or your preferred license)
- **Model weights**: Derived from WMT14 fr-en training data. Released for research purposes.
- **Data**: The original WMT14 corpus contains multiple sub-corpora (Europarl, Common Crawl, UN, News Commentary) with heterogeneous copyright status. **We do not redistribute the raw parallel text.** Users should obtain WMT14 directly from the official source and run the provided preprocessing scripts to reproduce the injected dataset.
This project is a conceptual art piece and a feasibility study. It is not a serious machine translation system.
---
## Acknowledgments
- Vaswani et al. (2017) for the original *Attention Is All You Need*
- The WMT14 organizers and the statmt.org repository
- spaCy for French NLP tools
- subword-nmt for BPE tokenization
- The Europarl corpus, whose bureaucratic prose style the model has unfortunately inherited