morpheuslord
/

rewrite

@@ -122,6 +122,126 @@ Rewriter/
 └── pyproject.toml                  # Project metadata
 ```
 ---
 ## Design Choices & Rationale
@@ -467,3 +587,4 @@ These are projected through a 2-layer MLP (`41 → 256 → 512`) with LayerNorm
 3. **Vocabulary elevation**: BERT fill-mask can suggest semantically inappropriate AWL words; the similarity threshold (0.82) is a trade-off between coverage and accuracy
 4. **Already-correct text**: The model is trained on error→correction pairs; feeding it clean text produces unpredictable output
 5. **LanguageTool latency**: Spell correction takes ~15-20s due to JVM startup on first call

 └── pyproject.toml                  # Project metadata
 ```
+## Model Architecture
+```mermaid
+graph TB
+    %% ── Inference Pipeline (left-to-right flow) ──────────────────────
+    subgraph INFERENCE["🔮 Inference Pipeline"]
+        direction TB
+        INPUT["📝 Raw Dyslectic Text"]
+        subgraph PREPROCESS["Pre-Processing"]
+            SPELL["Spell Corrector<br/><i>dyslexia-aware phonetic</i>"]
+            SENT_SEG["Sentence Segmenter"]
+            DEP_PARSE["Dependency Parser"]
+            NER["NER Tagger"]
+        end
+        subgraph STYLE["Style Analysis"]
+            FINGER["Style Fingerprinter<br/><i>512-dim vector</i>"]
+            EMOTION["Emotion Classifier"]
+            FORMALITY["Formality Classifier"]
+            STYLE_VEC["Style Vector Composer"]
+        end
+        subgraph GENERATION["Core Generation"]
+            STYLE_COND["Style Conditioner<br/><i>prefix tuning</i>"]
+            BASE_MODEL["Base LM<br/><i>Flan-T5 / BART / Llama-3</i>"]
+            LORA["LoRA Adapter"]
+            GEN_UTILS["Generation Utils<br/><i>beam search, sampling</i>"]
+        end
+        subgraph POSTPROCESS["Post-Processing"]
+            POSTPROC["Post-Processor<br/><i>formatting, cleanup</i>"]
+            VOCAB_SUB["Lexical Substitution<br/><i>BERT-based</i>"]
+            AWL["AWL Loader<br/><i>Coxhead Academic Word List</i>"]
+            REG_FILTER["Register Filter<br/><i>academic tone gate</i>"]
+        end
+        OUTPUT["✅ Corrected Academic Text"]
+        INPUT --> SPELL --> SENT_SEG --> DEP_PARSE --> NER
+        INPUT --> FINGER --> EMOTION --> FORMALITY --> STYLE_VEC
+        NER --> STYLE_COND
+        STYLE_VEC --> STYLE_COND
+        STYLE_COND --> BASE_MODEL
+        LORA -.->|"merged weights"| BASE_MODEL
+        BASE_MODEL --> GEN_UTILS --> POSTPROC
+        POSTPROC --> VOCAB_SUB
+        AWL --> VOCAB_SUB
+        VOCAB_SUB --> REG_FILTER --> OUTPUT
+    end
+    %% ── Training Pipeline ────────────────────────────────────────────
+    subgraph TRAINING["🏋️ Training Pipeline"]
+        direction TB
+        subgraph DATA["Data Pipeline"]
+            RAW_DATA["Raw Datasets<br/><i>JFLEG, WI+LOCNESS, C4_200M,<br/>FCE, Lang-8, NUCLE</i>"]
+            KAGGLE["Kaggle Datasets<br/><i>Shanegerami, Starblasters8</i>"]
+            PREPROC_SCRIPT["preprocess_data.py"]
+            TRAIN_JSONL["train.jsonl / val.jsonl / test.jsonl"]
+        end
+        subgraph HP_PRETRAIN["Human Pattern Pre-Training"]
+            FEAT_EXTRACT["Feature Extractor<br/><i>17-dim: perplexity, burstiness,<br/>n-gram novelty, AI markers...</i>"]
+            GPT2["GPT-2<br/><i>perplexity scorer</i>"]
+            HP_CLASSIFIER["Human Pattern Classifier<br/><i>MLP: 17→128→64→1</i>"]
+            HP_WEIGHTS["human_pattern_classifier.pt"]
+        end
+        subgraph MAIN_TRAIN["Main Model Training"]
+            DATASET["WritingCorrectionDataset"]
+            COMBINED_LOSS["Combined Loss Function"]
+            L_CE["L_CE<br/><i>cross-entropy</i>"]
+            L_STYLE["λ₁ · L_style<br/><i>style consistency</i>"]
+            L_SEM["λ₂ · L_semantic<br/><i>meaning preservation</i>"]
+            L_HUMAN["λ₃ · L_human_pattern<br/><i>anti-AI penalty</i>"]
+            TRAINER["CorrectionTrainer"]
+            CALLBACKS["Callbacks<br/><i>StyleMetrics,<br/>EarlyStoppingOnStyleDrift</i>"]
+        end
+        subgraph EVAL["Evaluation"]
+            ERRANT["ERRANT Evaluator<br/><i>P / R / F₀.₅</i>"]
+            GLEU["GLEU Scorer"]
+            STYLE_MET["Style Metrics<br/><i>cosine similarity</i>"]
+            AUTH_VER["Authorship Verifier<br/><i>AI detection resistance</i>"]
+        end
+        RAW_DATA --> PREPROC_SCRIPT --> TRAIN_JSONL
+        KAGGLE --> FEAT_EXTRACT
+        GPT2 --> FEAT_EXTRACT --> HP_CLASSIFIER --> HP_WEIGHTS
+        TRAIN_JSONL --> DATASET --> TRAINER
+        L_CE --> COMBINED_LOSS
+        L_STYLE --> COMBINED_LOSS
+        L_SEM --> COMBINED_LOSS
+        HP_WEIGHTS -.->|"frozen"| L_HUMAN --> COMBINED_LOSS
+        COMBINED_LOSS --> TRAINER
+        CALLBACKS --> TRAINER
+        TRAINER --> EVAL
+    end
+    %% ── API Layer ────────────────────────────────────────────────────
+    subgraph API["🌐 FastAPI Server"]
+        ENDPOINT["/correct endpoint"]
+        SCHEMAS["Request / Response Schemas"]
+        MIDDLEWARE["Rate Limiting & CORS"]
+        CORRECTOR["Corrector<br/><i>orchestrates full pipeline</i>"]
+    end
+    ENDPOINT --> CORRECTOR --> INFERENCE
+    TRAINER -->|"best_model/"| BASE_MODEL
+    %% ── Styling ────────────────────────────────────────���─────────────
+    classDef pipeline fill:#1a1a2e,stroke:#16213e,color:#e94560,stroke-width:2px
+    classDef module fill:#0f3460,stroke:#533483,color:#e2e2e2,stroke-width:1px
+    classDef data fill:#1a1a2e,stroke:#e94560,color:#eee,stroke-width:1px
+    classDef output fill:#533483,stroke:#e94560,color:#fff,stroke-width:2px
+    class INPUT,RAW_DATA,KAGGLE,TRAIN_JSONL data
+    class OUTPUT,HP_WEIGHTS output
+```
 ---
 ## Design Choices & Rationale
 3. **Vocabulary elevation**: BERT fill-mask can suggest semantically inappropriate AWL words; the similarity threshold (0.82) is a trade-off between coverage and accuracy
 4. **Already-correct text**: The model is trained on error→correction pairs; feeding it clean text produces unpredictable output
 5. **LanguageTool latency**: Spell correction takes ~15-20s due to JVM startup on first call
+6. **Semantic drift in correction**: Qualitative evaluation reveals the pipeline can introduce meaning-level errors rather than purely correcting surface errors — e.g. dyslexic phonetic patterns misread by LanguageTool produce plausible-but-wrong word substitutions that corrupt the intended meaning. The Style Similarity metric (0.96) does not capture this failure mode, as it measures surface token overlap rather than semantic faithfulness. Future work should add **BERTScore F1** and **Word Error Rate (WER)** against ground-truth corrections as primary evaluation signals, and a dedicated post-correction **semantic faithfulness check** (cosine similarity between input and output sentence embeddings) to flag and reject meaning-drift before returning output.