rewrite / graphify-out /GRAPH_REPORT.md
morpheuslord's picture
Add files using upload-large-folder tool
3df5819 verified

Graph Report - /run/media/morpheuslord/Personal_Files/Projects/Rewriter (2026-05-03)

Corpus Check

  • 442 files · ~1,967,332 words
  • Verdict: corpus is large enough that graph structure adds value.

Summary

  • 549 nodes · 873 edges · 27 communities detected
  • Extraction: 76% EXTRACTED · 24% INFERRED · 0% AMBIGUOUS · INFERRED: 208 edges (avg confidence: 0.6)
  • Token cost: 0 input · 0 output

Community Hubs (Navigation)

  • [[_COMMUNITY_Module Group 0|Module Group 0]]
  • [[_COMMUNITY_Utility Scripts|Utility Scripts]]
  • [[_COMMUNITY_Module Group 2|Module Group 2]]
  • [[_COMMUNITY_Module Group 3|Module Group 3]]
  • [[_COMMUNITY_Utility Scripts|Utility Scripts]]
  • [[_COMMUNITY_Module Group 5|Module Group 5]]
  • [[_COMMUNITY_Token Management|Token Management]]
  • [[_COMMUNITY_Utility Scripts|Utility Scripts]]
  • [[_COMMUNITY_Authentication|Authentication]]
  • [[_COMMUNITY_Utility Scripts|Utility Scripts]]
  • [[_COMMUNITY_Module Group 10|Module Group 10]]
  • [[_COMMUNITY_Feed Scoring & Pool|Feed Scoring & Pool]]
  • [[_COMMUNITY_Module Group 12|Module Group 12]]
  • [[_COMMUNITY_Token Management|Token Management]]
  • [[_COMMUNITY_Module Group 14|Module Group 14]]
  • [[_COMMUNITY_Utility Scripts|Utility Scripts]]
  • [[_COMMUNITY_Module Group 16|Module Group 16]]
  • [[_COMMUNITY_Module Group 17|Module Group 17]]
  • [[_COMMUNITY_Module Group 18|Module Group 18]]
  • [[_COMMUNITY_Module Group 19|Module Group 19]]
  • [[_COMMUNITY_Module Group 20|Module Group 20]]
  • [[_COMMUNITY_Infrastructure (Terraform)|Infrastructure (Terraform)]]
  • [[_COMMUNITY_Utility Scripts|Utility Scripts]]
  • [[_COMMUNITY_Module Group 23|Module Group 23]]
  • [[_COMMUNITY_Security & Rate Limiting|Security & Rate Limiting]]
  • [[_COMMUNITY_WebSocket Codec|WebSocket Codec]]
  • [[_COMMUNITY_Module Group 27|Module Group 27]]

God Nodes (most connected - your core abstractions)

  1. train() - 34 edges
  2. __init__() - 28 edges
  3. __init__() - 27 edges
  4. __init__() - 27 edges
  5. __init__() - 27 edges
  6. __init__() - 27 edges
  7. __init__() - 27 edges
  8. __init__() - 27 edges
  9. correct() - 16 edges
  10. __init__() - 13 edges

Surprising Connections (you probably didn't know these)

  • run_inference() --calls--> correct() [INFERRED] scripts/run_inference.py → src/preprocessing/spell_corrector.py
  • train() --calls--> __init__() [INFERRED] scripts/train.py → src/training/dataset.py
  • __init__() --calls--> __init__() [INFERRED] scripts/train.py → src/training/dataset.py
  • score() --calls--> forward() [INFERRED] src/training/human_pattern_extractor.py → scripts/train.py
  • test_spell_correction_empty() --calls--> correct() [INFERRED] tests/test_preprocessing.py → src/inference/corrector.py

Hyperedges (group relationships)

  • WebSocket Channel System — sem_unified_ws, sem_feed_ws, sem_chat_ws, sem_keysync_ws, sem_discovery_ws [EXTRACTED 1.00]
  • Security Defense Stack — sem_hmac_verification, sem_origin_secret, sem_pow_challenge, sem_rate_limiting, sem_attack_detection [EXTRACTED 1.00]
  • Feed Recommendation Pipeline — sem_feed_pool, sem_feed_filters, sem_feed_scoring, sem_feed_heatmap, sem_feed_reciprocal, sem_feed_gradient [EXTRACTED 1.00]

Communities

Community 0 - "Module Group 0"

Cohesion: 0.04 Nodes (55): EntitySpan, NERTagger, Tags named entities and produces protected spans., Named Entity Recognition tagger. Identifies entities (persons, locations, organi, get_protected_spans(), Return (start, end) char spans that must not be modified., tag(), Extract all named entities from text. (+47 more)

Community 1 - "Utility Scripts"

Cohesion: 0.06 Nodes (38): Evaluation script. Runs all evaluation metrics on the test set. Run: python scri, evaluate(), Run evaluation on the specified data split., ERRANTEvaluator, Evaluates grammar correction quality using ERRANT annotations., ERRANT-based grammatical error evaluation. Uses the ERRANT toolkit for standardi, evaluate(), Compute ERRANT precision, recall, F0.5. (+30 more)

Community 2 - "Module Group 2"

Cohesion: 0.07 Nodes (36): StyleFingerprinter, Extracts style fingerprint vectors from text samples., StyleProjectionMLP, Projects raw feature vector to 512-dim style embedding., _avg_dep_tree_depth(), Compute average dependency tree depth across all tokens., _avg_syllables_per_word(), Average syllables per word. (+28 more)

Community 3 - "Module Group 3"

Cohesion: 0.06 Nodes (35): AWLLoader, Loads and manages Academic Word List data., _load_synonyms(), Load academic synonym mappings from JSON., _load_word_list(), Load a word list file into a set of lowercase words., all_words(), Return the full set of academic words. (+27 more)

Community 4 - "Utility Scripts"

Cohesion: 0.31 Nodes (34): init(), CEOnlyLoss, Cross-entropy only loss — the only loss that provides gradient signal., init(), _auto_batch_size(), Pick optimal batch size based on model size and available resources., _setup_device(), Detect GPU and configure hybrid VRAM management.

Returns (device, gpu_info) whe (+26 more)

Community 5 - "Module Group 5"

Cohesion: 0.08 Nodes (29): DyslexiaSimulator, Generates synthetic dyslectic text from clean input for data augmentation., _double_letter(), Double a random interior letter., _omit_letter(), Remove a random interior letter., _reverse_letter(), Swap b/d, p/q style reversals. (+21 more)

Community 6 - "Token Management"

Cohesion: 0.07 Nodes (28): Loads and wraps the base pretrained model. Supported architectures:

  • google/f, load_model_and_tokenizer(), Load a pretrained model with optional LoRA and quantization.

Args: model_ke, apply_lora(), Apply LoRA adapters to a model and return the wrapped model., create_lora_config(), Create a LoRA configuration for the given task type., LoRA adapter configuration and management. Wraps PEFT LoRA utilities for applyin (+20 more)

Community 7 - "Utility Scripts"

Cohesion: 0.08 Nodes (28): Pre-trains the HumanPatternClassifier on both Kaggle datasets. Run this BEFORE t, train_classifier(), Pre-train the human pattern classifier on Kaggle datasets., forward(), HumanPatternClassifier, Lightweight MLP trained to distinguish human from AI writing. Input: feature vec, HumanPatternFeatureExtractor, Extracts 17-dimensional feature vector encoding human vs AI writing patterns.

O (+20 more)

Community 8 - "Authentication"

Cohesion: 0.08 Nodes (27): AuthorshipVerifier, Verifies authorship consistency between input and output text., Authorship verification module. Uses a fine-tuned model to verify whether the co, verify(), Return probability that both texts were written by the same author.

Uses senten, average_style_vectors(), Compute the mean style vector from a list of vectors., cosine_similarity() (+19 more)

Community 9 - "Utility Scripts"

Cohesion: 0.08 Nodes (25): Interactive inference script. Run: python scripts/run_inference.py --config conf, run_inference(), Run inference on text input., correct_text(), Correct dyslectic text with style preservation and academic elevation., FastAPI server for the Dyslexia Academic Writing Corrector API. Provides RESTful, health(), Health check endpoint. (+17 more)

Community 10 - "Module Group 10"

Cohesion: 0.1 Nodes (27): _get_call_name(), Extract callable name from ast.Call node., _get_name(), Extract name from various AST node types., _resolve_edges(), Post-process edges to resolve bare names to actual node IDs.

The per-file AST e, build_semantic_nodes(), Build semantic nodes from documentation files. These capture high-level architec (+19 more)

Community 11 - "Feed Scoring & Pool"

Cohesion: 0.08 Nodes (27): Chat WebSocket Channel, Discovery WebSocket Channel, E2EE X25519 Key Exchange, FastAPI Stateless Backend, Feed Hard Filters (12 Rules), 3-Tier Gradient Distribution, Preference Heatmap (Learned AI), Feed Pool Computation Pipeline (+19 more)

Community 12 - "Module Group 12"

Cohesion: 0.12 Nodes (22): GLEU, (Note: This script computes sentence-level GLEU score.)

This script calculates , get_gleu_stats(), calculate mean and confidence interval from all GLEU iterations, get_ngram_counts(), get ngrams of order n for a tokenized sentence, get_ngram_diff(), returns ngrams in a but not in b (+14 more)

Community 13 - "Token Management"

Cohesion: 0.16 Nodes (17): clean_para(), convert_char_to_tok(), get_all_tok_starts_and_ends(), get_paras(), get_sents(), get_token_edits(), main(), noop_edit() (+9 more)

Community 14 - "Module Group 14"

Cohesion: 0.13 Nodes (14): FormalityClassifier, Scores text formality on a 0-1 scale using rule-based heuristics., Formality classifier module. Classifies text on a 0-1 formality scale using ling, score(), Return formality score in [0, 1]. Higher = more formal.

Scoring based on:

  • Con, RegisterFilterAdvanced, Advanced register filtering with nominalisation and hedging passes., add_hedging() (+6 more)

Community 15 - "Utility Scripts"

Cohesion: 0.2 Nodes (14): apply_bea19_edits(), Apply BEA-2019 character-level edits to produce corrected text.

edits_block for, create_splits(), Split train.jsonl into train and val sets., Converts all raw dataset formats into unified JSONL training format. Output sche, main(), process_bea19_json(), Process a BEA-2019 format JSON file (FCE or W&I+LOCNESS). Each line is a JSON ob (+6 more)

Community 16 - "Module Group 16"

Cohesion: 0.24 Nodes (9): CorrectionTrainer, Custom trainer — uses model's built-in loss directly., _strip_custom_fields(), Remove dataset fields that T5 doesn't accept., compute_loss(), Use model's built-in CE loss — avoids double-computing logits loss., Custom HuggingFace Trainer subclass. Uses the model's built-in cross-entropy los, prediction_step() (+1 more)

Community 17 - "Module Group 17"

Cohesion: 0.29 Nodes (5): RateLimitMiddleware, Simple in-memory rate limiting., RequestLoggingMiddleware, Logs all incoming requests with timing information., API middleware for request logging, rate limiting, and error handling.

Community 18 - "Module Group 18"

Cohesion: 0.29 Nodes (5): EarlyStoppingOnStyleDrift, Stops training if style similarity drops below threshold., StyleMetricsCallback, Logs style similarity metrics during evaluation., Training callbacks for monitoring and checkpointing. Integrates with Weights & B

Community 19 - "Module Group 19"

Cohesion: 0.33 Nodes (5): EmotionClassifier, Classifies emotional register of text using keyword-based analysis., classify(), Return emotion distribution over register categories.

Returns a dict with keys:, Emotion/register classifier module. Classifies text emotional register (neutral,

Community 20 - "Module Group 20"

Cohesion: 0.5 Nodes (3): CorrectionRequest, CorrectionResponse, Pydantic schemas for API request/response validation.

Community 21 - "Infrastructure (Terraform)"

Cohesion: 0.5 Nodes (4): ALB + Auto Scaling Group, AWS Secrets Manager Integration, Terraform AWS Infrastructure, VPC Network Topology

Community 22 - "Utility Scripts"

Cohesion: 0.67 Nodes (1): Downloads all publicly available HuggingFace datasets automatically. Datasets re

Community 23 - "Module Group 23"

Cohesion: 0.67 Nodes (3): Cloudflare Edge Proxy, Lambda Origin Secret Rotator, X-Origin-Secret Middleware

Community 24 - "Security & Rate Limiting"

Cohesion: 1.0 Nodes (2): Attack Detection & IP Risk Management, Per-IP Rate Limiting

Community 26 - "WebSocket Codec"

Cohesion: 1.0 Nodes (1): HMAC-SHA256 Request Verification

Community 27 - "Module Group 27"

Cohesion: 1.0 Nodes (1): Proof-of-Work Challenge

Knowledge Gaps

  • 259 isolated node(s): `graphify_rebuild.py — One-shot NudR knowledge graph regeneration.

Usage: py, Walk the project and return list of relevant files with metadata., Compare against manifest to find changed files., SHA-256 hash for cache keying., Extract AST nodes and edges from a single Python file.` (+254 more) These have ≤1 connection - possible missing edges or undocumented components.

  • Thin community Utility Scripts (3 nodes): download_all_huggingface_datasets.py, Downloads all publicly available HuggingFace datasets automatically. Datasets re, main() Too small to be a meaningful cluster - may be noise or needs more connections extracted.
  • Thin community Security & Rate Limiting (2 nodes): Attack Detection & IP Risk Management, Per-IP Rate Limiting Too small to be a meaningful cluster - may be noise or needs more connections extracted.
  • Thin community WebSocket Codec (1 nodes): HMAC-SHA256 Request Verification Too small to be a meaningful cluster - may be noise or needs more connections extracted.
  • Thin community Module Group 27 (1 nodes): Proof-of-Work Challenge Too small to be a meaningful cluster - may be noise or needs more connections extracted.

Suggested Questions

Questions this graph is uniquely positioned to answer:

  • Why does parse() connect Token Management to Utility Scripts, Module Group 10? High betweenness centrality (0.125) - this node is a cross-community bridge.
  • Why does correct() connect Utility Scripts to Module Group 0, Utility Scripts, Module Group 2, Module Group 3? High betweenness centrality (0.092) - this node is a cross-community bridge.
  • Why does extract_ast_file() connect Module Group 10 to Token Management? High betweenness centrality (0.083) - this node is a cross-community bridge.
  • Are the 26 inferred relationships involving train() (e.g. with __init__() and __init__()) actually correct? train() has 26 INFERRED edges - model-reasoned connections that need verification.
  • Are the 26 inferred relationships involving __init__() (e.g. with train() and __init__()) actually correct? __init__() has 26 INFERRED edges - model-reasoned connections that need verification.
  • Are the 26 inferred relationships involving __init__() (e.g. with train() and __init__()) actually correct? __init__() has 26 INFERRED edges - model-reasoned connections that need verification.
  • Are the 26 inferred relationships involving __init__() (e.g. with train() and __init__()) actually correct? __init__() has 26 INFERRED edges - model-reasoned connections that need verification.