rewrite / todo_registry.md
morpheuslord's picture
Add files using upload-large-folder tool
3df5819 verified

TODO Registry β€” Implementation Checklist

97 TODOs across 26 files β€” βœ… ALL IMPLEMENTED


src/preprocessing/ β€” 16 TODOs βœ…

spell_corrector.py

Line TODO Status
36 Implement initialisation (SpellChecker + LanguageTool) βœ… DONE
41 Implement phonetic pass (regex substitution from DYSLEXIC_PHONETIC_MAP) βœ… DONE
46 Implement spellcheck pass (pyspellchecker token-level) βœ… DONE
51 Implement LanguageTool pass (context-aware, reverse-offset correction) βœ… DONE
56 Implement full correction pipeline (chain all 3 passes) βœ… DONE
61 Implement cleanup (self.tool.close()) βœ… DONE

sentence_segmenter.py

Line TODO Status
15 Implement initialisation (load spaCy model) βœ… DONE
20 Implement sentence segmentation βœ… DONE

dependency_parser.py

Line TODO Status
16 Implement initialisation βœ… DONE
21 Implement dependency parsing βœ… DONE
26 Implement SVO extraction βœ… DONE

ner_tagger.py

Line TODO Status
24 Implement initialisation βœ… DONE
29 Implement NER tagging βœ… DONE
34 Implement protected span extraction βœ… DONE

dyslexia_simulator.py

Line TODO Status
35 Implement initialisation (set error_rate, seed) βœ… DONE
40 Implement letter transposition βœ… DONE
45 Implement letter omission βœ… DONE
50 Implement letter doubling βœ… DONE
55 Implement letter reversal (b/d, p/q) βœ… DONE
60 Implement word corruption (random error selection) βœ… DONE
65 Implement full simulation (corrupt + word merge) βœ… DONE

pipeline.py

Line TODO Status
38 Implement initialisation (load spaCy + spell corrector) βœ… DONE
43 Implement readability extraction (Flesch-Kincaid, Gunning Fog, SMOG, ARI) βœ… DONE
48 Implement dependency tree extraction (SVO per sentence) βœ… DONE
53 Implement full pipeline (7-step: spellβ†’parseβ†’segmentβ†’NERβ†’depsβ†’POSβ†’readability) βœ… DONE

src/style/ β€” 14 TODOs βœ…

fingerprinter.py

Line TODO Status
64 Implement MLP layers (Linearβ†’LayerNormβ†’GELUβ†’Dropoutβ†’Linearβ†’LayerNorm) βœ… DONE
68 Implement forward pass (MLP projection) βœ… DONE
76 Implement initialisation (spaCy + AWL + projection MLP) βœ… DONE
81 Implement AWL loading from file βœ… DONE
86 Implement passive voice detection (nsubjpass/auxpass dep labels) βœ… DONE
91 Implement avg dependency tree depth βœ… DONE
96 Implement lexical density (content words / total) βœ… DONE
101 Implement raw feature extraction (~40 features) βœ… DONE
106 Implement vector extraction (raw features β†’ pad/truncate to 40 β†’ MLP β†’ 512-dim) βœ… DONE
120 Implement vector blending with L2 normalisation βœ… DONE

formality_classifier.py

Line TODO Status
14 Implement initialisation βœ… DONE
19 Implement formality scoring (0-1 scale) βœ… DONE

emotion_classifier.py

Line TODO Status
14 Implement initialisation βœ… DONE
19 Implement emotion classification (distribution over register categories) βœ… DONE

style_vector.py

Line TODO Status
12 Implement cosine similarity βœ… DONE
18 Implement vector averaging βœ… DONE
24 Implement save to disk βœ… DONE
30 Implement load from disk βœ… DONE

src/model/ β€” 5 TODOs βœ…

base_model.py

Line TODO Status
39 Implement model loading (tokenizer + model + quantization + LoRA wrapping) βœ… DONE

lora_adapter.py

Line TODO Status
20 Implement LoRA config creation βœ… DONE
26 Implement LoRA application to model βœ… DONE
32 Implement weight merging for inference βœ… DONE

style_conditioner.py

Line TODO Status
27 Implement projection layers (Linear β†’ Tanh) βœ… DONE
37 Implement forward pass (project + reshape) βœ… DONE
53 Implement prefix prepending (torch.cat along seq dim) βœ… DONE

generation_utils.py

Line TODO Status
20 Implement generation with beam search βœ… DONE
30 Implement batch generation βœ… DONE

src/training/ β€” 22 TODOs βœ…

dataset.py

Line TODO Status
54 Implement initialisation and data loading βœ… DONE
59 Implement JSONL loading βœ… DONE
64 Implement synthetic data augmentation βœ… DONE
68 Implement __len__ βœ… DONE
73 Implement __getitem__ βœ… DONE

loss_functions.py

Line TODO Status
34 Implement V1 initialisation βœ… DONE
43 Implement style loss (1 - cosine_similarity) βœ… DONE
52 Implement semantic loss βœ… DONE
65 Implement combined loss V1 βœ… DONE
82 Implement V2 initialisation with frozen classifier βœ… DONE
87 Implement human pattern loss (1 - human_score) βœ… DONE
100 Implement combined loss V2 βœ… DONE

trainer.py

Line TODO Status
17 Store loss function, fingerprinter, and tokenizer βœ… DONE
22 Implement custom compute_loss βœ… DONE

callbacks.py

Line TODO Status
14 Implement evaluation-time style metric logging βœ… DONE
22 Implement early stopping initialisation βœ… DONE
26 Implement early stopping check βœ… DONE

human_pattern_extractor.py

Line TODO Status
68 Implement initialisation (spaCy + GPT-2) βœ… DONE
73 Implement GPT-2 perplexity calculation βœ… DONE
78 Implement burstiness βœ… DONE
83 Implement sentence starter diversity βœ… DONE
88 Implement n-gram novelty βœ… DONE
93 Implement AI marker density βœ… DONE
98 Implement discourse density βœ… DONE
103 Implement punctuation patterns βœ… DONE
108 Implement full 17-dim feature extraction βœ… DONE
125 Implement KaggleHumanPatternDataset loading βœ… DONE
129 Implement __len__ βœ… DONE
133 Implement __getitem__ βœ… DONE
148 Implement HumanPatternClassifier MLP layers βœ… DONE
153 Implement forward pass βœ… DONE
158 Implement single-text scoring βœ… DONE

src/vocabulary/ β€” 10 TODOs βœ…

awl_loader.py

Line TODO Status
21 Implement initialisation βœ… DONE
26 Implement word list file loading βœ… DONE
31 Implement synonym JSON loading βœ… DONE
36 Implement is_academic() βœ… DONE
41 Implement get_academic_synonyms() βœ… DONE
47 Implement all_words property βœ… DONE

lexical_substitution.py

Line TODO Status
41 Implement initialisation βœ… DONE
46 Implement contextual semantic similarity βœ… DONE
51 Implement AWL substitution generation βœ… DONE
56 Implement vocabulary elevation βœ… DONE
106 Implement register filtering βœ… DONE

register_filter.py

Line TODO Status
14 Implement initialisation βœ… DONE
19 Implement nominalisation βœ… DONE
24 Implement hedging βœ… DONE
29 Implement formality check βœ… DONE

src/evaluation/ β€” 7 TODOs βœ…

gleu_scorer.py

Line TODO Status
20 Implement corpus-level GLEU scoring βœ… DONE
29 Implement BERTScore computation βœ… DONE

errant_evaluator.py

Line TODO Status
15 Implement initialisation (ERRANT annotator) βœ… DONE
23 Implement ERRANT evaluation βœ… DONE

style_metrics.py

Line TODO Status
19 Implement style similarity βœ… DONE
24 Implement AWL coverage βœ… DONE
33 Implement batch evaluation βœ… DONE

authorship_verifier.py

Line TODO Status
14 Implement initialisation (load model) βœ… DONE
19 Implement authorship verification βœ… DONE

src/inference/ β€” 3 TODOs βœ…

corrector.py

Line TODO Status
39 Implement initialisation βœ… DONE
52 Implement full correction pipeline βœ… DONE

postprocessor.py

Line TODO Status
14 Implement initialisation βœ… DONE
19 Implement text cleanup βœ… DONE
27 Implement entity restoration βœ… DONE
32 Implement final formatting βœ… DONE

src/api/ β€” 2 TODOs βœ…

main.py

Line TODO Status
22 Load config and initialise corrector on startup βœ… DONE
31 Implement /correct endpoint βœ… DONE

middleware.py

Line TODO Status
14 Implement request logging (timing, path, status) βœ… DONE
22 Implement rate limiter state βœ… DONE
26 Implement rate limiting logic βœ… DONE

scripts/ β€” 5 TODOs βœ…

train.py

Line TODO Status
24 Implement training pipeline (10 steps) βœ… DONE

evaluate.py

Line TODO Status
19 Implement evaluation pipeline βœ… DONE

run_inference.py

Line TODO Status
21 Implement inference pipeline βœ… DONE

pretrain_human_pattern_classifier.py

Line TODO Status
23 Implement classifier pre-training βœ… DONE

tests/ β€” 18 TODOs βœ…

test_preprocessing.py β€” 7 tests βœ…

test_style.py β€” 4 tests βœ…

test_model.py β€” 2 tests + 3 new βœ…

test_vocabulary.py β€” 4 tests βœ…

test_evaluation.py β€” 4 tests βœ…


Shell Scripts βœ…

Script Purpose
train.sh Multi-stage training with Skip/Redo/Continue checkpoint system
start.sh Inference launcher (CLI REPL or API server)

Summary by Package

Package TODOs Status
src/preprocessing/ 16 βœ… ALL DONE
src/style/ 14 βœ… ALL DONE
src/model/ 5 βœ… ALL DONE
src/training/ 22 βœ… ALL DONE
src/vocabulary/ 10 βœ… ALL DONE
src/evaluation/ 7 βœ… ALL DONE
src/inference/ 3 βœ… ALL DONE
src/api/ 2 βœ… ALL DONE
scripts/ 5 βœ… ALL DONE
tests/ 18 βœ… ALL DONE
Total 97 βœ… ALL DONE