TODO Registry β Implementation Checklist
97 TODOs across 26 files β β ALL IMPLEMENTED
src/preprocessing/ β 16 TODOs β
spell_corrector.py
| Line | TODO | Status |
|---|---|---|
| 36 | Implement initialisation (SpellChecker + LanguageTool) | β DONE |
| 41 | Implement phonetic pass (regex substitution from DYSLEXIC_PHONETIC_MAP) |
β DONE |
| 46 | Implement spellcheck pass (pyspellchecker token-level) | β DONE |
| 51 | Implement LanguageTool pass (context-aware, reverse-offset correction) | β DONE |
| 56 | Implement full correction pipeline (chain all 3 passes) | β DONE |
| 61 | Implement cleanup (self.tool.close()) |
β DONE |
sentence_segmenter.py
| Line | TODO | Status |
|---|---|---|
| 15 | Implement initialisation (load spaCy model) | β DONE |
| 20 | Implement sentence segmentation | β DONE |
dependency_parser.py
| Line | TODO | Status |
|---|---|---|
| 16 | Implement initialisation | β DONE |
| 21 | Implement dependency parsing | β DONE |
| 26 | Implement SVO extraction | β DONE |
ner_tagger.py
| Line | TODO | Status |
|---|---|---|
| 24 | Implement initialisation | β DONE |
| 29 | Implement NER tagging | β DONE |
| 34 | Implement protected span extraction | β DONE |
dyslexia_simulator.py
| Line | TODO | Status |
|---|---|---|
| 35 | Implement initialisation (set error_rate, seed) | β DONE |
| 40 | Implement letter transposition | β DONE |
| 45 | Implement letter omission | β DONE |
| 50 | Implement letter doubling | β DONE |
| 55 | Implement letter reversal (b/d, p/q) | β DONE |
| 60 | Implement word corruption (random error selection) | β DONE |
| 65 | Implement full simulation (corrupt + word merge) | β DONE |
pipeline.py
| Line | TODO | Status |
|---|---|---|
| 38 | Implement initialisation (load spaCy + spell corrector) | β DONE |
| 43 | Implement readability extraction (Flesch-Kincaid, Gunning Fog, SMOG, ARI) | β DONE |
| 48 | Implement dependency tree extraction (SVO per sentence) | β DONE |
| 53 | Implement full pipeline (7-step: spellβparseβsegmentβNERβdepsβPOSβreadability) | β DONE |
src/style/ β 14 TODOs β
fingerprinter.py
| Line | TODO | Status |
|---|---|---|
| 64 | Implement MLP layers (LinearβLayerNormβGELUβDropoutβLinearβLayerNorm) | β DONE |
| 68 | Implement forward pass (MLP projection) | β DONE |
| 76 | Implement initialisation (spaCy + AWL + projection MLP) | β DONE |
| 81 | Implement AWL loading from file | β DONE |
| 86 | Implement passive voice detection (nsubjpass/auxpass dep labels) | β DONE |
| 91 | Implement avg dependency tree depth | β DONE |
| 96 | Implement lexical density (content words / total) | β DONE |
| 101 | Implement raw feature extraction (~40 features) | β DONE |
| 106 | Implement vector extraction (raw features β pad/truncate to 40 β MLP β 512-dim) | β DONE |
| 120 | Implement vector blending with L2 normalisation | β DONE |
formality_classifier.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement initialisation | β DONE |
| 19 | Implement formality scoring (0-1 scale) | β DONE |
emotion_classifier.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement initialisation | β DONE |
| 19 | Implement emotion classification (distribution over register categories) | β DONE |
style_vector.py
| Line | TODO | Status |
|---|---|---|
| 12 | Implement cosine similarity | β DONE |
| 18 | Implement vector averaging | β DONE |
| 24 | Implement save to disk | β DONE |
| 30 | Implement load from disk | β DONE |
src/model/ β 5 TODOs β
base_model.py
| Line | TODO | Status |
|---|---|---|
| 39 | Implement model loading (tokenizer + model + quantization + LoRA wrapping) | β DONE |
lora_adapter.py
| Line | TODO | Status |
|---|---|---|
| 20 | Implement LoRA config creation | β DONE |
| 26 | Implement LoRA application to model | β DONE |
| 32 | Implement weight merging for inference | β DONE |
style_conditioner.py
| Line | TODO | Status |
|---|---|---|
| 27 | Implement projection layers (Linear β Tanh) | β DONE |
| 37 | Implement forward pass (project + reshape) | β DONE |
| 53 | Implement prefix prepending (torch.cat along seq dim) | β DONE |
generation_utils.py
| Line | TODO | Status |
|---|---|---|
| 20 | Implement generation with beam search | β DONE |
| 30 | Implement batch generation | β DONE |
src/training/ β 22 TODOs β
dataset.py
| Line | TODO | Status |
|---|---|---|
| 54 | Implement initialisation and data loading | β DONE |
| 59 | Implement JSONL loading | β DONE |
| 64 | Implement synthetic data augmentation | β DONE |
| 68 | Implement __len__ |
β DONE |
| 73 | Implement __getitem__ |
β DONE |
loss_functions.py
| Line | TODO | Status |
|---|---|---|
| 34 | Implement V1 initialisation | β DONE |
| 43 | Implement style loss (1 - cosine_similarity) | β DONE |
| 52 | Implement semantic loss | β DONE |
| 65 | Implement combined loss V1 | β DONE |
| 82 | Implement V2 initialisation with frozen classifier | β DONE |
| 87 | Implement human pattern loss (1 - human_score) | β DONE |
| 100 | Implement combined loss V2 | β DONE |
trainer.py
| Line | TODO | Status |
|---|---|---|
| 17 | Store loss function, fingerprinter, and tokenizer | β DONE |
| 22 | Implement custom compute_loss |
β DONE |
callbacks.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement evaluation-time style metric logging | β DONE |
| 22 | Implement early stopping initialisation | β DONE |
| 26 | Implement early stopping check | β DONE |
human_pattern_extractor.py
| Line | TODO | Status |
|---|---|---|
| 68 | Implement initialisation (spaCy + GPT-2) | β DONE |
| 73 | Implement GPT-2 perplexity calculation | β DONE |
| 78 | Implement burstiness | β DONE |
| 83 | Implement sentence starter diversity | β DONE |
| 88 | Implement n-gram novelty | β DONE |
| 93 | Implement AI marker density | β DONE |
| 98 | Implement discourse density | β DONE |
| 103 | Implement punctuation patterns | β DONE |
| 108 | Implement full 17-dim feature extraction | β DONE |
| 125 | Implement KaggleHumanPatternDataset loading | β DONE |
| 129 | Implement __len__ |
β DONE |
| 133 | Implement __getitem__ |
β DONE |
| 148 | Implement HumanPatternClassifier MLP layers | β DONE |
| 153 | Implement forward pass | β DONE |
| 158 | Implement single-text scoring | β DONE |
src/vocabulary/ β 10 TODOs β
awl_loader.py
| Line | TODO | Status |
|---|---|---|
| 21 | Implement initialisation | β DONE |
| 26 | Implement word list file loading | β DONE |
| 31 | Implement synonym JSON loading | β DONE |
| 36 | Implement is_academic() |
β DONE |
| 41 | Implement get_academic_synonyms() |
β DONE |
| 47 | Implement all_words property |
β DONE |
lexical_substitution.py
| Line | TODO | Status |
|---|---|---|
| 41 | Implement initialisation | β DONE |
| 46 | Implement contextual semantic similarity | β DONE |
| 51 | Implement AWL substitution generation | β DONE |
| 56 | Implement vocabulary elevation | β DONE |
| 106 | Implement register filtering | β DONE |
register_filter.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement initialisation | β DONE |
| 19 | Implement nominalisation | β DONE |
| 24 | Implement hedging | β DONE |
| 29 | Implement formality check | β DONE |
src/evaluation/ β 7 TODOs β
gleu_scorer.py
| Line | TODO | Status |
|---|---|---|
| 20 | Implement corpus-level GLEU scoring | β DONE |
| 29 | Implement BERTScore computation | β DONE |
errant_evaluator.py
| Line | TODO | Status |
|---|---|---|
| 15 | Implement initialisation (ERRANT annotator) | β DONE |
| 23 | Implement ERRANT evaluation | β DONE |
style_metrics.py
| Line | TODO | Status |
|---|---|---|
| 19 | Implement style similarity | β DONE |
| 24 | Implement AWL coverage | β DONE |
| 33 | Implement batch evaluation | β DONE |
authorship_verifier.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement initialisation (load model) | β DONE |
| 19 | Implement authorship verification | β DONE |
src/inference/ β 3 TODOs β
corrector.py
| Line | TODO | Status |
|---|---|---|
| 39 | Implement initialisation | β DONE |
| 52 | Implement full correction pipeline | β DONE |
postprocessor.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement initialisation | β DONE |
| 19 | Implement text cleanup | β DONE |
| 27 | Implement entity restoration | β DONE |
| 32 | Implement final formatting | β DONE |
src/api/ β 2 TODOs β
main.py
| Line | TODO | Status |
|---|---|---|
| 22 | Load config and initialise corrector on startup | β DONE |
| 31 | Implement /correct endpoint |
β DONE |
middleware.py
| Line | TODO | Status |
|---|---|---|
| 14 | Implement request logging (timing, path, status) | β DONE |
| 22 | Implement rate limiter state | β DONE |
| 26 | Implement rate limiting logic | β DONE |
scripts/ β 5 TODOs β
train.py
| Line | TODO | Status |
|---|---|---|
| 24 | Implement training pipeline (10 steps) | β DONE |
evaluate.py
| Line | TODO | Status |
|---|---|---|
| 19 | Implement evaluation pipeline | β DONE |
run_inference.py
| Line | TODO | Status |
|---|---|---|
| 21 | Implement inference pipeline | β DONE |
pretrain_human_pattern_classifier.py
| Line | TODO | Status |
|---|---|---|
| 23 | Implement classifier pre-training | β DONE |
tests/ β 18 TODOs β
test_preprocessing.py β 7 tests β
test_style.py β 4 tests β
test_model.py β 2 tests + 3 new β
test_vocabulary.py β 4 tests β
test_evaluation.py β 4 tests β
Shell Scripts β
| Script | Purpose |
|---|---|
| train.sh | Multi-stage training with Skip/Redo/Continue checkpoint system |
| start.sh | Inference launcher (CLI REPL or API server) |
Summary by Package
| Package | TODOs | Status |
|---|---|---|
src/preprocessing/ |
16 | β ALL DONE |
src/style/ |
14 | β ALL DONE |
src/model/ |
5 | β ALL DONE |
src/training/ |
22 | β ALL DONE |
src/vocabulary/ |
10 | β ALL DONE |
src/evaluation/ |
7 | β ALL DONE |
src/inference/ |
3 | β ALL DONE |
src/api/ |
2 | β ALL DONE |
scripts/ |
5 | β ALL DONE |
tests/ |
18 | β ALL DONE |
| Total | 97 | β ALL DONE |