rewrite / graphify-out /GRAPH_REPORT.md
morpheuslord's picture
Add files using upload-large-folder tool
3df5819 verified
# Graph Report - /run/media/morpheuslord/Personal_Files/Projects/Rewriter (2026-05-03)
## Corpus Check
- 442 files · ~1,967,332 words
- Verdict: corpus is large enough that graph structure adds value.
## Summary
- 549 nodes · 873 edges · 27 communities detected
- Extraction: 76% EXTRACTED · 24% INFERRED · 0% AMBIGUOUS · INFERRED: 208 edges (avg confidence: 0.6)
- Token cost: 0 input · 0 output
## Community Hubs (Navigation)
- [[_COMMUNITY_Module Group 0|Module Group 0]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 2|Module Group 2]]
- [[_COMMUNITY_Module Group 3|Module Group 3]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 5|Module Group 5]]
- [[_COMMUNITY_Token Management|Token Management]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Authentication|Authentication]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 10|Module Group 10]]
- [[_COMMUNITY_Feed Scoring & Pool|Feed Scoring & Pool]]
- [[_COMMUNITY_Module Group 12|Module Group 12]]
- [[_COMMUNITY_Token Management|Token Management]]
- [[_COMMUNITY_Module Group 14|Module Group 14]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 16|Module Group 16]]
- [[_COMMUNITY_Module Group 17|Module Group 17]]
- [[_COMMUNITY_Module Group 18|Module Group 18]]
- [[_COMMUNITY_Module Group 19|Module Group 19]]
- [[_COMMUNITY_Module Group 20|Module Group 20]]
- [[_COMMUNITY_Infrastructure (Terraform)|Infrastructure (Terraform)]]
- [[_COMMUNITY_Utility Scripts|Utility Scripts]]
- [[_COMMUNITY_Module Group 23|Module Group 23]]
- [[_COMMUNITY_Security & Rate Limiting|Security & Rate Limiting]]
- [[_COMMUNITY_WebSocket Codec|WebSocket Codec]]
- [[_COMMUNITY_Module Group 27|Module Group 27]]
## God Nodes (most connected - your core abstractions)
1. `train()` - 34 edges
2. `__init__()` - 28 edges
3. `__init__()` - 27 edges
4. `__init__()` - 27 edges
5. `__init__()` - 27 edges
6. `__init__()` - 27 edges
7. `__init__()` - 27 edges
8. `__init__()` - 27 edges
9. `correct()` - 16 edges
10. `__init__()` - 13 edges
## Surprising Connections (you probably didn't know these)
- `run_inference()` --calls--> `correct()` [INFERRED]
scripts/run_inference.py → src/preprocessing/spell_corrector.py
- `train()` --calls--> `__init__()` [INFERRED]
scripts/train.py → src/training/dataset.py
- `__init__()` --calls--> `__init__()` [INFERRED]
scripts/train.py → src/training/dataset.py
- `score()` --calls--> `forward()` [INFERRED]
src/training/human_pattern_extractor.py → scripts/train.py
- `test_spell_correction_empty()` --calls--> `correct()` [INFERRED]
tests/test_preprocessing.py → src/inference/corrector.py
## Hyperedges (group relationships)
- **WebSocket Channel System** — sem_unified_ws, sem_feed_ws, sem_chat_ws, sem_keysync_ws, sem_discovery_ws [EXTRACTED 1.00]
- **Security Defense Stack** — sem_hmac_verification, sem_origin_secret, sem_pow_challenge, sem_rate_limiting, sem_attack_detection [EXTRACTED 1.00]
- **Feed Recommendation Pipeline** — sem_feed_pool, sem_feed_filters, sem_feed_scoring, sem_feed_heatmap, sem_feed_reciprocal, sem_feed_gradient [EXTRACTED 1.00]
## Communities
### Community 0 - "Module Group 0"
Cohesion: 0.04
Nodes (55): EntitySpan, NERTagger, Tags named entities and produces protected spans., Named Entity Recognition tagger.
Identifies entities (persons, locations, organi, get_protected_spans(), Return (start, end) char spans that must not be modified., tag(), Extract all named entities from text. (+47 more)
### Community 1 - "Utility Scripts"
Cohesion: 0.06
Nodes (38): Evaluation script.
Runs all evaluation metrics on the test set.
Run: python scri, evaluate(), Run evaluation on the specified data split., ERRANTEvaluator, Evaluates grammar correction quality using ERRANT annotations., ERRANT-based grammatical error evaluation.
Uses the ERRANT toolkit for standardi, evaluate(), Compute ERRANT precision, recall, F0.5. (+30 more)
### Community 2 - "Module Group 2"
Cohesion: 0.07
Nodes (36): StyleFingerprinter, Extracts style fingerprint vectors from text samples., StyleProjectionMLP, Projects raw feature vector to 512-dim style embedding., _avg_dep_tree_depth(), Compute average dependency tree depth across all tokens., _avg_syllables_per_word(), Average syllables per word. (+28 more)
### Community 3 - "Module Group 3"
Cohesion: 0.06
Nodes (35): AWLLoader, Loads and manages Academic Word List data., _load_synonyms(), Load academic synonym mappings from JSON., _load_word_list(), Load a word list file into a set of lowercase words., all_words(), Return the full set of academic words. (+27 more)
### Community 4 - "Utility Scripts"
Cohesion: 0.31
Nodes (34): __init__(), CEOnlyLoss, Cross-entropy only loss — the only loss that provides gradient signal., __init__(), _auto_batch_size(), Pick optimal batch size based on model size and available resources., _setup_device(), Detect GPU and configure hybrid VRAM management.
Returns (device, gpu_info) whe (+26 more)
### Community 5 - "Module Group 5"
Cohesion: 0.08
Nodes (29): DyslexiaSimulator, Generates synthetic dyslectic text from clean input for data augmentation., _double_letter(), Double a random interior letter., _omit_letter(), Remove a random interior letter., _reverse_letter(), Swap b/d, p/q style reversals. (+21 more)
### Community 6 - "Token Management"
Cohesion: 0.07
Nodes (28): Loads and wraps the base pretrained model.
Supported architectures:
- google/f, load_model_and_tokenizer(), Load a pretrained model with optional LoRA and quantization.
Args:
model_ke, apply_lora(), Apply LoRA adapters to a model and return the wrapped model., create_lora_config(), Create a LoRA configuration for the given task type., LoRA adapter configuration and management.
Wraps PEFT LoRA utilities for applyin (+20 more)
### Community 7 - "Utility Scripts"
Cohesion: 0.08
Nodes (28): Pre-trains the HumanPatternClassifier on both Kaggle datasets.
Run this BEFORE t, train_classifier(), Pre-train the human pattern classifier on Kaggle datasets., forward(), HumanPatternClassifier, Lightweight MLP trained to distinguish human from AI writing.
Input: feature vec, HumanPatternFeatureExtractor, Extracts 17-dimensional feature vector encoding human vs AI writing patterns.
O (+20 more)
### Community 8 - "Authentication"
Cohesion: 0.08
Nodes (27): AuthorshipVerifier, Verifies authorship consistency between input and output text., Authorship verification module.
Uses a fine-tuned model to verify whether the co, verify(), Return probability that both texts were written by the same author.
Uses senten, average_style_vectors(), Compute the mean style vector from a list of vectors., cosine_similarity() (+19 more)
### Community 9 - "Utility Scripts"
Cohesion: 0.08
Nodes (25): Interactive inference script.
Run: python scripts/run_inference.py --config conf, run_inference(), Run inference on text input., correct_text(), Correct dyslectic text with style preservation and academic elevation., FastAPI server for the Dyslexia Academic Writing Corrector API.
Provides RESTful, health(), Health check endpoint. (+17 more)
### Community 10 - "Module Group 10"
Cohesion: 0.1
Nodes (27): _get_call_name(), Extract callable name from ast.Call node., _get_name(), Extract name from various AST node types., _resolve_edges(), Post-process edges to resolve bare names to actual node IDs.
The per-file AST e, build_semantic_nodes(), Build semantic nodes from documentation files.
These capture high-level architec (+19 more)
### Community 11 - "Feed Scoring & Pool"
Cohesion: 0.08
Nodes (27): Chat WebSocket Channel, Discovery WebSocket Channel, E2EE X25519 Key Exchange, FastAPI Stateless Backend, Feed Hard Filters (12 Rules), 3-Tier Gradient Distribution, Preference Heatmap (Learned AI), Feed Pool Computation Pipeline (+19 more)
### Community 12 - "Module Group 12"
Cohesion: 0.12
Nodes (22): GLEU, (Note: This script computes sentence-level GLEU score.)
This script calculates , get_gleu_stats(), calculate mean and confidence interval from all GLEU iterations, get_ngram_counts(), get ngrams of order n for a tokenized sentence, get_ngram_diff(), returns ngrams in a but not in b (+14 more)
### Community 13 - "Token Management"
Cohesion: 0.16
Nodes (17): clean_para(), convert_char_to_tok(), get_all_tok_starts_and_ends(), get_paras(), get_sents(), get_token_edits(), main(), noop_edit() (+9 more)
### Community 14 - "Module Group 14"
Cohesion: 0.13
Nodes (14): FormalityClassifier, Scores text formality on a 0-1 scale using rule-based heuristics., Formality classifier module.
Classifies text on a 0-1 formality scale using ling, score(), Return formality score in [0, 1]. Higher = more formal.
Scoring based on:
- Con, RegisterFilterAdvanced, Advanced register filtering with nominalisation and hedging passes., add_hedging() (+6 more)
### Community 15 - "Utility Scripts"
Cohesion: 0.2
Nodes (14): apply_bea19_edits(), Apply BEA-2019 character-level edits to produce corrected text.
edits_block for, create_splits(), Split train.jsonl into train and val sets., Converts all raw dataset formats into unified JSONL training format.
Output sche, main(), process_bea19_json(), Process a BEA-2019 format JSON file (FCE or W&I+LOCNESS).
Each line is a JSON ob (+6 more)
### Community 16 - "Module Group 16"
Cohesion: 0.24
Nodes (9): CorrectionTrainer, Custom trainer — uses model's built-in loss directly., _strip_custom_fields(), Remove dataset fields that T5 doesn't accept., compute_loss(), Use model's built-in CE loss — avoids double-computing logits loss., Custom HuggingFace Trainer subclass.
Uses the model's built-in cross-entropy los, prediction_step() (+1 more)
### Community 17 - "Module Group 17"
Cohesion: 0.29
Nodes (5): RateLimitMiddleware, Simple in-memory rate limiting., RequestLoggingMiddleware, Logs all incoming requests with timing information., API middleware for request logging, rate limiting, and error handling.
### Community 18 - "Module Group 18"
Cohesion: 0.29
Nodes (5): EarlyStoppingOnStyleDrift, Stops training if style similarity drops below threshold., StyleMetricsCallback, Logs style similarity metrics during evaluation., Training callbacks for monitoring and checkpointing.
Integrates with Weights & B
### Community 19 - "Module Group 19"
Cohesion: 0.33
Nodes (5): EmotionClassifier, Classifies emotional register of text using keyword-based analysis., classify(), Return emotion distribution over register categories.
Returns a dict with keys:, Emotion/register classifier module.
Classifies text emotional register (neutral,
### Community 20 - "Module Group 20"
Cohesion: 0.5
Nodes (3): CorrectionRequest, CorrectionResponse, Pydantic schemas for API request/response validation.
### Community 21 - "Infrastructure (Terraform)"
Cohesion: 0.5
Nodes (4): ALB + Auto Scaling Group, AWS Secrets Manager Integration, Terraform AWS Infrastructure, VPC Network Topology
### Community 22 - "Utility Scripts"
Cohesion: 0.67
Nodes (1): Downloads all publicly available HuggingFace datasets automatically.
Datasets re
### Community 23 - "Module Group 23"
Cohesion: 0.67
Nodes (3): Cloudflare Edge Proxy, Lambda Origin Secret Rotator, X-Origin-Secret Middleware
### Community 24 - "Security & Rate Limiting"
Cohesion: 1.0
Nodes (2): Attack Detection & IP Risk Management, Per-IP Rate Limiting
### Community 26 - "WebSocket Codec"
Cohesion: 1.0
Nodes (1): HMAC-SHA256 Request Verification
### Community 27 - "Module Group 27"
Cohesion: 1.0
Nodes (1): Proof-of-Work Challenge
## Knowledge Gaps
- **259 isolated node(s):** `graphify_rebuild.py — One-shot NudR knowledge graph regeneration.
Usage:
py`, `Walk the project and return list of relevant files with metadata.`, `Compare against manifest to find changed files.`, `SHA-256 hash for cache keying.`, `Extract AST nodes and edges from a single Python file.` (+254 more)
These have ≤1 connection - possible missing edges or undocumented components.
- **Thin community `Utility Scripts`** (3 nodes): `download_all_huggingface_datasets.py`, `Downloads all publicly available HuggingFace datasets automatically.
Datasets re`, `main()`
Too small to be a meaningful cluster - may be noise or needs more connections extracted.
- **Thin community `Security & Rate Limiting`** (2 nodes): `Attack Detection & IP Risk Management`, `Per-IP Rate Limiting`
Too small to be a meaningful cluster - may be noise or needs more connections extracted.
- **Thin community `WebSocket Codec`** (1 nodes): `HMAC-SHA256 Request Verification`
Too small to be a meaningful cluster - may be noise or needs more connections extracted.
- **Thin community `Module Group 27`** (1 nodes): `Proof-of-Work Challenge`
Too small to be a meaningful cluster - may be noise or needs more connections extracted.
## Suggested Questions
_Questions this graph is uniquely positioned to answer:_
- **Why does `parse()` connect `Token Management` to `Utility Scripts`, `Module Group 10`?**
_High betweenness centrality (0.125) - this node is a cross-community bridge._
- **Why does `correct()` connect `Utility Scripts` to `Module Group 0`, `Utility Scripts`, `Module Group 2`, `Module Group 3`?**
_High betweenness centrality (0.092) - this node is a cross-community bridge._
- **Why does `extract_ast_file()` connect `Module Group 10` to `Token Management`?**
_High betweenness centrality (0.083) - this node is a cross-community bridge._
- **Are the 26 inferred relationships involving `train()` (e.g. with `__init__()` and `__init__()`) actually correct?**
_`train()` has 26 INFERRED edges - model-reasoned connections that need verification._
- **Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?**
_`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._
- **Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?**
_`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._
- **Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?**
_`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._