Graph Report - /run/media/morpheuslord/Personal_Files/Projects/Rewriter (2026-05-03)
Corpus Check
- 442 files · ~1,967,332 words
- Verdict: corpus is large enough that graph structure adds value.
Summary
- 549 nodes · 873 edges · 27 communities detected
- Extraction: 76% EXTRACTED · 24% INFERRED · 0% AMBIGUOUS · INFERRED: 208 edges (avg confidence: 0.6)
- Token cost: 0 input · 0 output
Community Hubs (Navigation)
- [[_COMMUNITY_Module Group 0|Module Group 0]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 2|Module Group 2]]
- [[_COMMUNITY_Module Group 3|Module Group 3]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 5|Module Group 5]]
- [[_COMMUNITY_Token Management|Token Management]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Authentication|Authentication]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 10|Module Group 10]]
- [[_COMMUNITY_Feed Scoring & Pool|Feed Scoring & Pool]]
- [[_COMMUNITY_Module Group 12|Module Group 12]]
- [[_COMMUNITY_Token Management|Token Management]]
- [[_COMMUNITY_Module Group 14|Module Group 14]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 16|Module Group 16]]
- [[_COMMUNITY_Module Group 17|Module Group 17]]
- [[_COMMUNITY_Module Group 18|Module Group 18]]
- [[_COMMUNITY_Module Group 19|Module Group 19]]
- [[_COMMUNITY_Module Group 20|Module Group 20]]
- [[_COMMUNITY_Infrastructure (Terraform)|Infrastructure (Terraform)]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 23|Module Group 23]]
- [[_COMMUNITY_Security & Rate Limiting|Security & Rate Limiting]]
- [[_COMMUNITY_WebSocket Codec|WebSocket Codec]]
- [[_COMMUNITY_Module Group 27|Module Group 27]]
God Nodes (most connected - your core abstractions)
train()- 34 edges__init__()- 28 edges__init__()- 27 edges__init__()- 27 edges__init__()- 27 edges__init__()- 27 edges__init__()- 27 edges__init__()- 27 edgescorrect()- 16 edges__init__()- 13 edges
Surprising Connections (you probably didn't know these)
run_inference()--calls-->correct()[INFERRED] scripts/run_inference.py → src/preprocessing/spell_corrector.pytrain()--calls-->__init__()[INFERRED] scripts/train.py → src/training/dataset.py__init__()--calls-->__init__()[INFERRED] scripts/train.py → src/training/dataset.pyscore()--calls-->forward()[INFERRED] src/training/human_pattern_extractor.py → scripts/train.pytest_spell_correction_empty()--calls-->correct()[INFERRED] tests/test_preprocessing.py → src/inference/corrector.py
Hyperedges (group relationships)
- WebSocket Channel System — sem_unified_ws, sem_feed_ws, sem_chat_ws, sem_keysync_ws, sem_discovery_ws [EXTRACTED 1.00]
- Security Defense Stack — sem_hmac_verification, sem_origin_secret, sem_pow_challenge, sem_rate_limiting, sem_attack_detection [EXTRACTED 1.00]
- Feed Recommendation Pipeline — sem_feed_pool, sem_feed_filters, sem_feed_scoring, sem_feed_heatmap, sem_feed_reciprocal, sem_feed_gradient [EXTRACTED 1.00]
Communities
Community 0 - "Module Group 0"
Cohesion: 0.04 Nodes (55): EntitySpan, NERTagger, Tags named entities and produces protected spans., Named Entity Recognition tagger. Identifies entities (persons, locations, organi, get_protected_spans(), Return (start, end) char spans that must not be modified., tag(), Extract all named entities from text. (+47 more)
Community 1 - "Utility Scripts"
Cohesion: 0.06 Nodes (38): Evaluation script. Runs all evaluation metrics on the test set. Run: python scri, evaluate(), Run evaluation on the specified data split., ERRANTEvaluator, Evaluates grammar correction quality using ERRANT annotations., ERRANT-based grammatical error evaluation. Uses the ERRANT toolkit for standardi, evaluate(), Compute ERRANT precision, recall, F0.5. (+30 more)
Community 2 - "Module Group 2"
Cohesion: 0.07 Nodes (36): StyleFingerprinter, Extracts style fingerprint vectors from text samples., StyleProjectionMLP, Projects raw feature vector to 512-dim style embedding., _avg_dep_tree_depth(), Compute average dependency tree depth across all tokens., _avg_syllables_per_word(), Average syllables per word. (+28 more)
Community 3 - "Module Group 3"
Cohesion: 0.06 Nodes (35): AWLLoader, Loads and manages Academic Word List data., _load_synonyms(), Load academic synonym mappings from JSON., _load_word_list(), Load a word list file into a set of lowercase words., all_words(), Return the full set of academic words. (+27 more)
Community 4 - "Utility Scripts"
Cohesion: 0.31 Nodes (34): init(), CEOnlyLoss, Cross-entropy only loss — the only loss that provides gradient signal., init(), _auto_batch_size(), Pick optimal batch size based on model size and available resources., _setup_device(), Detect GPU and configure hybrid VRAM management.
Returns (device, gpu_info) whe (+26 more)
Community 5 - "Module Group 5"
Cohesion: 0.08 Nodes (29): DyslexiaSimulator, Generates synthetic dyslectic text from clean input for data augmentation., _double_letter(), Double a random interior letter., _omit_letter(), Remove a random interior letter., _reverse_letter(), Swap b/d, p/q style reversals. (+21 more)
Community 6 - "Token Management"
Cohesion: 0.07 Nodes (28): Loads and wraps the base pretrained model. Supported architectures:
- google/f, load_model_and_tokenizer(), Load a pretrained model with optional LoRA and quantization.
Args: model_ke, apply_lora(), Apply LoRA adapters to a model and return the wrapped model., create_lora_config(), Create a LoRA configuration for the given task type., LoRA adapter configuration and management. Wraps PEFT LoRA utilities for applyin (+20 more)
Community 7 - "Utility Scripts"
Cohesion: 0.08 Nodes (28): Pre-trains the HumanPatternClassifier on both Kaggle datasets. Run this BEFORE t, train_classifier(), Pre-train the human pattern classifier on Kaggle datasets., forward(), HumanPatternClassifier, Lightweight MLP trained to distinguish human from AI writing. Input: feature vec, HumanPatternFeatureExtractor, Extracts 17-dimensional feature vector encoding human vs AI writing patterns.
O (+20 more)
Community 8 - "Authentication"
Cohesion: 0.08 Nodes (27): AuthorshipVerifier, Verifies authorship consistency between input and output text., Authorship verification module. Uses a fine-tuned model to verify whether the co, verify(), Return probability that both texts were written by the same author.
Uses senten, average_style_vectors(), Compute the mean style vector from a list of vectors., cosine_similarity() (+19 more)
Community 9 - "Utility Scripts"
Cohesion: 0.08 Nodes (25): Interactive inference script. Run: python scripts/run_inference.py --config conf, run_inference(), Run inference on text input., correct_text(), Correct dyslectic text with style preservation and academic elevation., FastAPI server for the Dyslexia Academic Writing Corrector API. Provides RESTful, health(), Health check endpoint. (+17 more)
Community 10 - "Module Group 10"
Cohesion: 0.1 Nodes (27): _get_call_name(), Extract callable name from ast.Call node., _get_name(), Extract name from various AST node types., _resolve_edges(), Post-process edges to resolve bare names to actual node IDs.
The per-file AST e, build_semantic_nodes(), Build semantic nodes from documentation files. These capture high-level architec (+19 more)
Community 11 - "Feed Scoring & Pool"
Cohesion: 0.08 Nodes (27): Chat WebSocket Channel, Discovery WebSocket Channel, E2EE X25519 Key Exchange, FastAPI Stateless Backend, Feed Hard Filters (12 Rules), 3-Tier Gradient Distribution, Preference Heatmap (Learned AI), Feed Pool Computation Pipeline (+19 more)
Community 12 - "Module Group 12"
Cohesion: 0.12 Nodes (22): GLEU, (Note: This script computes sentence-level GLEU score.)
This script calculates , get_gleu_stats(), calculate mean and confidence interval from all GLEU iterations, get_ngram_counts(), get ngrams of order n for a tokenized sentence, get_ngram_diff(), returns ngrams in a but not in b (+14 more)
Community 13 - "Token Management"
Cohesion: 0.16 Nodes (17): clean_para(), convert_char_to_tok(), get_all_tok_starts_and_ends(), get_paras(), get_sents(), get_token_edits(), main(), noop_edit() (+9 more)
Community 14 - "Module Group 14"
Cohesion: 0.13 Nodes (14): FormalityClassifier, Scores text formality on a 0-1 scale using rule-based heuristics., Formality classifier module. Classifies text on a 0-1 formality scale using ling, score(), Return formality score in [0, 1]. Higher = more formal.
Scoring based on:
- Con, RegisterFilterAdvanced, Advanced register filtering with nominalisation and hedging passes., add_hedging() (+6 more)
Community 15 - "Utility Scripts"
Cohesion: 0.2 Nodes (14): apply_bea19_edits(), Apply BEA-2019 character-level edits to produce corrected text.
edits_block for, create_splits(), Split train.jsonl into train and val sets., Converts all raw dataset formats into unified JSONL training format. Output sche, main(), process_bea19_json(), Process a BEA-2019 format JSON file (FCE or W&I+LOCNESS). Each line is a JSON ob (+6 more)
Community 16 - "Module Group 16"
Cohesion: 0.24 Nodes (9): CorrectionTrainer, Custom trainer — uses model's built-in loss directly., _strip_custom_fields(), Remove dataset fields that T5 doesn't accept., compute_loss(), Use model's built-in CE loss — avoids double-computing logits loss., Custom HuggingFace Trainer subclass. Uses the model's built-in cross-entropy los, prediction_step() (+1 more)
Community 17 - "Module Group 17"
Cohesion: 0.29 Nodes (5): RateLimitMiddleware, Simple in-memory rate limiting., RequestLoggingMiddleware, Logs all incoming requests with timing information., API middleware for request logging, rate limiting, and error handling.
Community 18 - "Module Group 18"
Cohesion: 0.29 Nodes (5): EarlyStoppingOnStyleDrift, Stops training if style similarity drops below threshold., StyleMetricsCallback, Logs style similarity metrics during evaluation., Training callbacks for monitoring and checkpointing. Integrates with Weights & B
Community 19 - "Module Group 19"
Cohesion: 0.33 Nodes (5): EmotionClassifier, Classifies emotional register of text using keyword-based analysis., classify(), Return emotion distribution over register categories.
Returns a dict with keys:, Emotion/register classifier module. Classifies text emotional register (neutral,
Community 20 - "Module Group 20"
Cohesion: 0.5 Nodes (3): CorrectionRequest, CorrectionResponse, Pydantic schemas for API request/response validation.
Community 21 - "Infrastructure (Terraform)"
Cohesion: 0.5 Nodes (4): ALB + Auto Scaling Group, AWS Secrets Manager Integration, Terraform AWS Infrastructure, VPC Network Topology
Community 22 - "Utility Scripts"
Cohesion: 0.67 Nodes (1): Downloads all publicly available HuggingFace datasets automatically. Datasets re
Community 23 - "Module Group 23"
Cohesion: 0.67 Nodes (3): Cloudflare Edge Proxy, Lambda Origin Secret Rotator, X-Origin-Secret Middleware
Community 24 - "Security & Rate Limiting"
Cohesion: 1.0 Nodes (2): Attack Detection & IP Risk Management, Per-IP Rate Limiting
Community 26 - "WebSocket Codec"
Cohesion: 1.0 Nodes (1): HMAC-SHA256 Request Verification
Community 27 - "Module Group 27"
Cohesion: 1.0 Nodes (1): Proof-of-Work Challenge
Knowledge Gaps
- 259 isolated node(s): `graphify_rebuild.py — One-shot NudR knowledge graph regeneration.
Usage:
py, Walk the project and return list of relevant files with metadata., Compare against manifest to find changed files., SHA-256 hash for cache keying., Extract AST nodes and edges from a single Python file.` (+254 more)
These have ≤1 connection - possible missing edges or undocumented components.
- Thin community
Utility Scripts(3 nodes):download_all_huggingface_datasets.py,Downloads all publicly available HuggingFace datasets automatically. Datasets re,main()Too small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Security & Rate Limiting(2 nodes):Attack Detection & IP Risk Management,Per-IP Rate LimitingToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
WebSocket Codec(1 nodes):HMAC-SHA256 Request VerificationToo small to be a meaningful cluster - may be noise or needs more connections extracted. - Thin community
Module Group 27(1 nodes):Proof-of-Work ChallengeToo small to be a meaningful cluster - may be noise or needs more connections extracted.
Suggested Questions
Questions this graph is uniquely positioned to answer:
- Why does
parse()connectToken ManagementtoUtility Scripts,Module Group 10? High betweenness centrality (0.125) - this node is a cross-community bridge. - Why does
correct()connectUtility ScriptstoModule Group 0,Utility Scripts,Module Group 2,Module Group 3? High betweenness centrality (0.092) - this node is a cross-community bridge. - Why does
extract_ast_file()connectModule Group 10toToken Management? High betweenness centrality (0.083) - this node is a cross-community bridge. - Are the 26 inferred relationships involving
train()(e.g. with__init__()and__init__()) actually correct?train()has 26 INFERRED edges - model-reasoned connections that need verification. - Are the 26 inferred relationships involving
__init__()(e.g. withtrain()and__init__()) actually correct?__init__()has 26 INFERRED edges - model-reasoned connections that need verification. - Are the 26 inferred relationships involving
__init__()(e.g. withtrain()and__init__()) actually correct?__init__()has 26 INFERRED edges - model-reasoned connections that need verification. - Are the 26 inferred relationships involving
__init__()(e.g. withtrain()and__init__()) actually correct?__init__()has 26 INFERRED edges - model-reasoned connections that need verification.