# Graph Report - /run/media/morpheuslord/Personal_Files/Projects/Rewriter (2026-05-03) ## Corpus Check - 442 files · ~1,967,332 words - Verdict: corpus is large enough that graph structure adds value. ## Summary - 549 nodes · 873 edges · 27 communities detected - Extraction: 76% EXTRACTED · 24% INFERRED · 0% AMBIGUOUS · INFERRED: 208 edges (avg confidence: 0.6) - Token cost: 0 input · 0 output ## Community Hubs (Navigation) - [[_COMMUNITY_Module Group 0|Module Group 0]] - [[_COMMUNITY_Utility Scripts|Utility Scripts]] - [[_COMMUNITY_Module Group 2|Module Group 2]] - [[_COMMUNITY_Module Group 3|Module Group 3]] - [[_COMMUNITY_Utility Scripts|Utility Scripts]] - [[_COMMUNITY_Module Group 5|Module Group 5]] - [[_COMMUNITY_Token Management|Token Management]] - [[_COMMUNITY_Utility Scripts|Utility Scripts]] - [[_COMMUNITY_Authentication|Authentication]] - [[_COMMUNITY_Utility Scripts|Utility Scripts]] - [[_COMMUNITY_Module Group 10|Module Group 10]] - [[_COMMUNITY_Feed Scoring & Pool|Feed Scoring & Pool]] - [[_COMMUNITY_Module Group 12|Module Group 12]] - [[_COMMUNITY_Token Management|Token Management]] - [[_COMMUNITY_Module Group 14|Module Group 14]] - [[_COMMUNITY_Utility Scripts|Utility Scripts]] - [[_COMMUNITY_Module Group 16|Module Group 16]] - [[_COMMUNITY_Module Group 17|Module Group 17]] - [[_COMMUNITY_Module Group 18|Module Group 18]] - [[_COMMUNITY_Module Group 19|Module Group 19]] - [[_COMMUNITY_Module Group 20|Module Group 20]] - [[_COMMUNITY_Infrastructure (Terraform)|Infrastructure (Terraform)]] - [[_COMMUNITY_Utility Scripts|Utility Scripts]] - [[_COMMUNITY_Module Group 23|Module Group 23]] - [[_COMMUNITY_Security & Rate Limiting|Security & Rate Limiting]] - [[_COMMUNITY_WebSocket Codec|WebSocket Codec]] - [[_COMMUNITY_Module Group 27|Module Group 27]] ## God Nodes (most connected - your core abstractions) 1. `train()` - 34 edges 2. `__init__()` - 28 edges 3. `__init__()` - 27 edges 4. `__init__()` - 27 edges 5. `__init__()` - 27 edges 6. `__init__()` - 27 edges 7. `__init__()` - 27 edges 8. `__init__()` - 27 edges 9. `correct()` - 16 edges 10. `__init__()` - 13 edges ## Surprising Connections (you probably didn't know these) - `run_inference()` --calls--> `correct()` [INFERRED] scripts/run_inference.py → src/preprocessing/spell_corrector.py - `train()` --calls--> `__init__()` [INFERRED] scripts/train.py → src/training/dataset.py - `__init__()` --calls--> `__init__()` [INFERRED] scripts/train.py → src/training/dataset.py - `score()` --calls--> `forward()` [INFERRED] src/training/human_pattern_extractor.py → scripts/train.py - `test_spell_correction_empty()` --calls--> `correct()` [INFERRED] tests/test_preprocessing.py → src/inference/corrector.py ## Hyperedges (group relationships) - **WebSocket Channel System** — sem_unified_ws, sem_feed_ws, sem_chat_ws, sem_keysync_ws, sem_discovery_ws [EXTRACTED 1.00] - **Security Defense Stack** — sem_hmac_verification, sem_origin_secret, sem_pow_challenge, sem_rate_limiting, sem_attack_detection [EXTRACTED 1.00] - **Feed Recommendation Pipeline** — sem_feed_pool, sem_feed_filters, sem_feed_scoring, sem_feed_heatmap, sem_feed_reciprocal, sem_feed_gradient [EXTRACTED 1.00] ## Communities ### Community 0 - "Module Group 0" Cohesion: 0.04 Nodes (55): EntitySpan, NERTagger, Tags named entities and produces protected spans., Named Entity Recognition tagger. Identifies entities (persons, locations, organi, get_protected_spans(), Return (start, end) char spans that must not be modified., tag(), Extract all named entities from text. (+47 more) ### Community 1 - "Utility Scripts" Cohesion: 0.06 Nodes (38): Evaluation script. Runs all evaluation metrics on the test set. Run: python scri, evaluate(), Run evaluation on the specified data split., ERRANTEvaluator, Evaluates grammar correction quality using ERRANT annotations., ERRANT-based grammatical error evaluation. Uses the ERRANT toolkit for standardi, evaluate(), Compute ERRANT precision, recall, F0.5. (+30 more) ### Community 2 - "Module Group 2" Cohesion: 0.07 Nodes (36): StyleFingerprinter, Extracts style fingerprint vectors from text samples., StyleProjectionMLP, Projects raw feature vector to 512-dim style embedding., _avg_dep_tree_depth(), Compute average dependency tree depth across all tokens., _avg_syllables_per_word(), Average syllables per word. (+28 more) ### Community 3 - "Module Group 3" Cohesion: 0.06 Nodes (35): AWLLoader, Loads and manages Academic Word List data., _load_synonyms(), Load academic synonym mappings from JSON., _load_word_list(), Load a word list file into a set of lowercase words., all_words(), Return the full set of academic words. (+27 more) ### Community 4 - "Utility Scripts" Cohesion: 0.31 Nodes (34): __init__(), CEOnlyLoss, Cross-entropy only loss — the only loss that provides gradient signal., __init__(), _auto_batch_size(), Pick optimal batch size based on model size and available resources., _setup_device(), Detect GPU and configure hybrid VRAM management. Returns (device, gpu_info) whe (+26 more) ### Community 5 - "Module Group 5" Cohesion: 0.08 Nodes (29): DyslexiaSimulator, Generates synthetic dyslectic text from clean input for data augmentation., _double_letter(), Double a random interior letter., _omit_letter(), Remove a random interior letter., _reverse_letter(), Swap b/d, p/q style reversals. (+21 more) ### Community 6 - "Token Management" Cohesion: 0.07 Nodes (28): Loads and wraps the base pretrained model. Supported architectures: - google/f, load_model_and_tokenizer(), Load a pretrained model with optional LoRA and quantization. Args: model_ke, apply_lora(), Apply LoRA adapters to a model and return the wrapped model., create_lora_config(), Create a LoRA configuration for the given task type., LoRA adapter configuration and management. Wraps PEFT LoRA utilities for applyin (+20 more) ### Community 7 - "Utility Scripts" Cohesion: 0.08 Nodes (28): Pre-trains the HumanPatternClassifier on both Kaggle datasets. Run this BEFORE t, train_classifier(), Pre-train the human pattern classifier on Kaggle datasets., forward(), HumanPatternClassifier, Lightweight MLP trained to distinguish human from AI writing. Input: feature vec, HumanPatternFeatureExtractor, Extracts 17-dimensional feature vector encoding human vs AI writing patterns. O (+20 more) ### Community 8 - "Authentication" Cohesion: 0.08 Nodes (27): AuthorshipVerifier, Verifies authorship consistency between input and output text., Authorship verification module. Uses a fine-tuned model to verify whether the co, verify(), Return probability that both texts were written by the same author. Uses senten, average_style_vectors(), Compute the mean style vector from a list of vectors., cosine_similarity() (+19 more) ### Community 9 - "Utility Scripts" Cohesion: 0.08 Nodes (25): Interactive inference script. Run: python scripts/run_inference.py --config conf, run_inference(), Run inference on text input., correct_text(), Correct dyslectic text with style preservation and academic elevation., FastAPI server for the Dyslexia Academic Writing Corrector API. Provides RESTful, health(), Health check endpoint. (+17 more) ### Community 10 - "Module Group 10" Cohesion: 0.1 Nodes (27): _get_call_name(), Extract callable name from ast.Call node., _get_name(), Extract name from various AST node types., _resolve_edges(), Post-process edges to resolve bare names to actual node IDs. The per-file AST e, build_semantic_nodes(), Build semantic nodes from documentation files. These capture high-level architec (+19 more) ### Community 11 - "Feed Scoring & Pool" Cohesion: 0.08 Nodes (27): Chat WebSocket Channel, Discovery WebSocket Channel, E2EE X25519 Key Exchange, FastAPI Stateless Backend, Feed Hard Filters (12 Rules), 3-Tier Gradient Distribution, Preference Heatmap (Learned AI), Feed Pool Computation Pipeline (+19 more) ### Community 12 - "Module Group 12" Cohesion: 0.12 Nodes (22): GLEU, (Note: This script computes sentence-level GLEU score.) This script calculates , get_gleu_stats(), calculate mean and confidence interval from all GLEU iterations, get_ngram_counts(), get ngrams of order n for a tokenized sentence, get_ngram_diff(), returns ngrams in a but not in b (+14 more) ### Community 13 - "Token Management" Cohesion: 0.16 Nodes (17): clean_para(), convert_char_to_tok(), get_all_tok_starts_and_ends(), get_paras(), get_sents(), get_token_edits(), main(), noop_edit() (+9 more) ### Community 14 - "Module Group 14" Cohesion: 0.13 Nodes (14): FormalityClassifier, Scores text formality on a 0-1 scale using rule-based heuristics., Formality classifier module. Classifies text on a 0-1 formality scale using ling, score(), Return formality score in [0, 1]. Higher = more formal. Scoring based on: - Con, RegisterFilterAdvanced, Advanced register filtering with nominalisation and hedging passes., add_hedging() (+6 more) ### Community 15 - "Utility Scripts" Cohesion: 0.2 Nodes (14): apply_bea19_edits(), Apply BEA-2019 character-level edits to produce corrected text. edits_block for, create_splits(), Split train.jsonl into train and val sets., Converts all raw dataset formats into unified JSONL training format. Output sche, main(), process_bea19_json(), Process a BEA-2019 format JSON file (FCE or W&I+LOCNESS). Each line is a JSON ob (+6 more) ### Community 16 - "Module Group 16" Cohesion: 0.24 Nodes (9): CorrectionTrainer, Custom trainer — uses model's built-in loss directly., _strip_custom_fields(), Remove dataset fields that T5 doesn't accept., compute_loss(), Use model's built-in CE loss — avoids double-computing logits loss., Custom HuggingFace Trainer subclass. Uses the model's built-in cross-entropy los, prediction_step() (+1 more) ### Community 17 - "Module Group 17" Cohesion: 0.29 Nodes (5): RateLimitMiddleware, Simple in-memory rate limiting., RequestLoggingMiddleware, Logs all incoming requests with timing information., API middleware for request logging, rate limiting, and error handling. ### Community 18 - "Module Group 18" Cohesion: 0.29 Nodes (5): EarlyStoppingOnStyleDrift, Stops training if style similarity drops below threshold., StyleMetricsCallback, Logs style similarity metrics during evaluation., Training callbacks for monitoring and checkpointing. Integrates with Weights & B ### Community 19 - "Module Group 19" Cohesion: 0.33 Nodes (5): EmotionClassifier, Classifies emotional register of text using keyword-based analysis., classify(), Return emotion distribution over register categories. Returns a dict with keys:, Emotion/register classifier module. Classifies text emotional register (neutral, ### Community 20 - "Module Group 20" Cohesion: 0.5 Nodes (3): CorrectionRequest, CorrectionResponse, Pydantic schemas for API request/response validation. ### Community 21 - "Infrastructure (Terraform)" Cohesion: 0.5 Nodes (4): ALB + Auto Scaling Group, AWS Secrets Manager Integration, Terraform AWS Infrastructure, VPC Network Topology ### Community 22 - "Utility Scripts" Cohesion: 0.67 Nodes (1): Downloads all publicly available HuggingFace datasets automatically. Datasets re ### Community 23 - "Module Group 23" Cohesion: 0.67 Nodes (3): Cloudflare Edge Proxy, Lambda Origin Secret Rotator, X-Origin-Secret Middleware ### Community 24 - "Security & Rate Limiting" Cohesion: 1.0 Nodes (2): Attack Detection & IP Risk Management, Per-IP Rate Limiting ### Community 26 - "WebSocket Codec" Cohesion: 1.0 Nodes (1): HMAC-SHA256 Request Verification ### Community 27 - "Module Group 27" Cohesion: 1.0 Nodes (1): Proof-of-Work Challenge ## Knowledge Gaps - **259 isolated node(s):** `graphify_rebuild.py — One-shot NudR knowledge graph regeneration. Usage: py`, `Walk the project and return list of relevant files with metadata.`, `Compare against manifest to find changed files.`, `SHA-256 hash for cache keying.`, `Extract AST nodes and edges from a single Python file.` (+254 more) These have ≤1 connection - possible missing edges or undocumented components. - **Thin community `Utility Scripts`** (3 nodes): `download_all_huggingface_datasets.py`, `Downloads all publicly available HuggingFace datasets automatically. Datasets re`, `main()` Too small to be a meaningful cluster - may be noise or needs more connections extracted. - **Thin community `Security & Rate Limiting`** (2 nodes): `Attack Detection & IP Risk Management`, `Per-IP Rate Limiting` Too small to be a meaningful cluster - may be noise or needs more connections extracted. - **Thin community `WebSocket Codec`** (1 nodes): `HMAC-SHA256 Request Verification` Too small to be a meaningful cluster - may be noise or needs more connections extracted. - **Thin community `Module Group 27`** (1 nodes): `Proof-of-Work Challenge` Too small to be a meaningful cluster - may be noise or needs more connections extracted. ## Suggested Questions _Questions this graph is uniquely positioned to answer:_ - **Why does `parse()` connect `Token Management` to `Utility Scripts`, `Module Group 10`?** _High betweenness centrality (0.125) - this node is a cross-community bridge._ - **Why does `correct()` connect `Utility Scripts` to `Module Group 0`, `Utility Scripts`, `Module Group 2`, `Module Group 3`?** _High betweenness centrality (0.092) - this node is a cross-community bridge._ - **Why does `extract_ast_file()` connect `Module Group 10` to `Token Management`?** _High betweenness centrality (0.083) - this node is a cross-community bridge._ - **Are the 26 inferred relationships involving `train()` (e.g. with `__init__()` and `__init__()`) actually correct?** _`train()` has 26 INFERRED edges - model-reasoned connections that need verification._ - **Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?** _`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._ - **Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?** _`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._ - **Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?** _`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._