Add files using upload-large-folder tool

3df5819 verified 27 days ago

14.4 kB

	# Graph Report - /run/media/morpheuslord/Personal_Files/Projects/Rewriter (2026-05-03)

	## Corpus Check
	- 442 files · ~1,967,332 words
	- Verdict: corpus is large enough that graph structure adds value.

	## Summary
	- 549 nodes · 873 edges · 27 communities detected
	- Extraction: 76% EXTRACTED · 24% INFERRED · 0% AMBIGUOUS · INFERRED: 208 edges (avg confidence: 0.6)
	- Token cost: 0 input · 0 output

	## Community Hubs (Navigation)
	- [[_COMMUNITY_Module Group 0\|Module Group 0]]
	- [[_COMMUNITY_Utility Scripts\|Utility Scripts]]
	- [[_COMMUNITY_Module Group 2\|Module Group 2]]
	- [[_COMMUNITY_Module Group 3\|Module Group 3]]
	- [[_COMMUNITY_Utility Scripts\|Utility Scripts]]
	- [[_COMMUNITY_Module Group 5\|Module Group 5]]
	- [[_COMMUNITY_Token Management\|Token Management]]
	- [[_COMMUNITY_Utility Scripts\|Utility Scripts]]
	- [[_COMMUNITY_Authentication\|Authentication]]
	- [[_COMMUNITY_Utility Scripts\|Utility Scripts]]
	- [[_COMMUNITY_Module Group 10\|Module Group 10]]
	- [[_COMMUNITY_Feed Scoring & Pool\|Feed Scoring & Pool]]
	- [[_COMMUNITY_Module Group 12\|Module Group 12]]
	- [[_COMMUNITY_Token Management\|Token Management]]
	- [[_COMMUNITY_Module Group 14\|Module Group 14]]
	- [[_COMMUNITY_Utility Scripts\|Utility Scripts]]
	- [[_COMMUNITY_Module Group 16\|Module Group 16]]
	- [[_COMMUNITY_Module Group 17\|Module Group 17]]
	- [[_COMMUNITY_Module Group 18\|Module Group 18]]
	- [[_COMMUNITY_Module Group 19\|Module Group 19]]
	- [[_COMMUNITY_Module Group 20\|Module Group 20]]
	- [[_COMMUNITY_Infrastructure (Terraform)\|Infrastructure (Terraform)]]
	- [[_COMMUNITY_Utility Scripts\|Utility Scripts]]
	- [[_COMMUNITY_Module Group 23\|Module Group 23]]
	- [[_COMMUNITY_Security & Rate Limiting\|Security & Rate Limiting]]
	- [[_COMMUNITY_WebSocket Codec\|WebSocket Codec]]
	- [[_COMMUNITY_Module Group 27\|Module Group 27]]

	## God Nodes (most connected - your core abstractions)
	1. `train()` - 34 edges
	2. `__init__()` - 28 edges
	3. `__init__()` - 27 edges
	4. `__init__()` - 27 edges
	5. `__init__()` - 27 edges
	6. `__init__()` - 27 edges
	7. `__init__()` - 27 edges
	8. `__init__()` - 27 edges
	9. `correct()` - 16 edges
	10. `__init__()` - 13 edges

	## Surprising Connections (you probably didn't know these)
	- `run_inference()` --calls--> `correct()` [INFERRED]
	scripts/run_inference.py → src/preprocessing/spell_corrector.py
	- `train()` --calls--> `__init__()` [INFERRED]
	scripts/train.py → src/training/dataset.py
	- `__init__()` --calls--> `__init__()` [INFERRED]
	scripts/train.py → src/training/dataset.py
	- `score()` --calls--> `forward()` [INFERRED]
	src/training/human_pattern_extractor.py → scripts/train.py
	- `test_spell_correction_empty()` --calls--> `correct()` [INFERRED]
	tests/test_preprocessing.py → src/inference/corrector.py

	## Hyperedges (group relationships)
	- WebSocket Channel System — sem_unified_ws, sem_feed_ws, sem_chat_ws, sem_keysync_ws, sem_discovery_ws [EXTRACTED 1.00]
	- Security Defense Stack — sem_hmac_verification, sem_origin_secret, sem_pow_challenge, sem_rate_limiting, sem_attack_detection [EXTRACTED 1.00]
	- Feed Recommendation Pipeline — sem_feed_pool, sem_feed_filters, sem_feed_scoring, sem_feed_heatmap, sem_feed_reciprocal, sem_feed_gradient [EXTRACTED 1.00]

	## Communities

	### Community 0 - "Module Group 0"
	Cohesion: 0.04
	Nodes (55): EntitySpan, NERTagger, Tags named entities and produces protected spans., Named Entity Recognition tagger.
	Identifies entities (persons, locations, organi, get_protected_spans(), Return (start, end) char spans that must not be modified., tag(), Extract all named entities from text. (+47 more)

	### Community 1 - "Utility Scripts"
	Cohesion: 0.06
	Nodes (38): Evaluation script.
	Runs all evaluation metrics on the test set.
	Run: python scri, evaluate(), Run evaluation on the specified data split., ERRANTEvaluator, Evaluates grammar correction quality using ERRANT annotations., ERRANT-based grammatical error evaluation.
	Uses the ERRANT toolkit for standardi, evaluate(), Compute ERRANT precision, recall, F0.5. (+30 more)

	### Community 2 - "Module Group 2"
	Cohesion: 0.07
	Nodes (36): StyleFingerprinter, Extracts style fingerprint vectors from text samples., StyleProjectionMLP, Projects raw feature vector to 512-dim style embedding., _avg_dep_tree_depth(), Compute average dependency tree depth across all tokens., _avg_syllables_per_word(), Average syllables per word. (+28 more)

	### Community 3 - "Module Group 3"
	Cohesion: 0.06
	Nodes (35): AWLLoader, Loads and manages Academic Word List data., _load_synonyms(), Load academic synonym mappings from JSON., _load_word_list(), Load a word list file into a set of lowercase words., all_words(), Return the full set of academic words. (+27 more)

	### Community 4 - "Utility Scripts"
	Cohesion: 0.31
	Nodes (34): __init__(), CEOnlyLoss, Cross-entropy only loss — the only loss that provides gradient signal., __init__(), _auto_batch_size(), Pick optimal batch size based on model size and available resources., _setup_device(), Detect GPU and configure hybrid VRAM management.

	Returns (device, gpu_info) whe (+26 more)

	### Community 5 - "Module Group 5"
	Cohesion: 0.08
	Nodes (29): DyslexiaSimulator, Generates synthetic dyslectic text from clean input for data augmentation., _double_letter(), Double a random interior letter., _omit_letter(), Remove a random interior letter., _reverse_letter(), Swap b/d, p/q style reversals. (+21 more)

	### Community 6 - "Token Management"
	Cohesion: 0.07
	Nodes (28): Loads and wraps the base pretrained model.
	Supported architectures:
	- google/f, load_model_and_tokenizer(), Load a pretrained model with optional LoRA and quantization.

	Args:
	model_ke, apply_lora(), Apply LoRA adapters to a model and return the wrapped model., create_lora_config(), Create a LoRA configuration for the given task type., LoRA adapter configuration and management.
	Wraps PEFT LoRA utilities for applyin (+20 more)

	### Community 7 - "Utility Scripts"
	Cohesion: 0.08
	Nodes (28): Pre-trains the HumanPatternClassifier on both Kaggle datasets.
	Run this BEFORE t, train_classifier(), Pre-train the human pattern classifier on Kaggle datasets., forward(), HumanPatternClassifier, Lightweight MLP trained to distinguish human from AI writing.
	Input: feature vec, HumanPatternFeatureExtractor, Extracts 17-dimensional feature vector encoding human vs AI writing patterns.

	O (+20 more)

	### Community 8 - "Authentication"
	Cohesion: 0.08
	Nodes (27): AuthorshipVerifier, Verifies authorship consistency between input and output text., Authorship verification module.
	Uses a fine-tuned model to verify whether the co, verify(), Return probability that both texts were written by the same author.

	Uses senten, average_style_vectors(), Compute the mean style vector from a list of vectors., cosine_similarity() (+19 more)

	### Community 9 - "Utility Scripts"
	Cohesion: 0.08
	Nodes (25): Interactive inference script.
	Run: python scripts/run_inference.py --config conf, run_inference(), Run inference on text input., correct_text(), Correct dyslectic text with style preservation and academic elevation., FastAPI server for the Dyslexia Academic Writing Corrector API.
	Provides RESTful, health(), Health check endpoint. (+17 more)

	### Community 10 - "Module Group 10"
	Cohesion: 0.1
	Nodes (27): _get_call_name(), Extract callable name from ast.Call node., _get_name(), Extract name from various AST node types., _resolve_edges(), Post-process edges to resolve bare names to actual node IDs.

	The per-file AST e, build_semantic_nodes(), Build semantic nodes from documentation files.
	These capture high-level architec (+19 more)

	### Community 11 - "Feed Scoring & Pool"
	Cohesion: 0.08
	Nodes (27): Chat WebSocket Channel, Discovery WebSocket Channel, E2EE X25519 Key Exchange, FastAPI Stateless Backend, Feed Hard Filters (12 Rules), 3-Tier Gradient Distribution, Preference Heatmap (Learned AI), Feed Pool Computation Pipeline (+19 more)

	### Community 12 - "Module Group 12"
	Cohesion: 0.12
	Nodes (22): GLEU, (Note: This script computes sentence-level GLEU score.)

	This script calculates , get_gleu_stats(), calculate mean and confidence interval from all GLEU iterations, get_ngram_counts(), get ngrams of order n for a tokenized sentence, get_ngram_diff(), returns ngrams in a but not in b (+14 more)

	### Community 13 - "Token Management"
	Cohesion: 0.16
	Nodes (17): clean_para(), convert_char_to_tok(), get_all_tok_starts_and_ends(), get_paras(), get_sents(), get_token_edits(), main(), noop_edit() (+9 more)

	### Community 14 - "Module Group 14"
	Cohesion: 0.13
	Nodes (14): FormalityClassifier, Scores text formality on a 0-1 scale using rule-based heuristics., Formality classifier module.
	Classifies text on a 0-1 formality scale using ling, score(), Return formality score in [0, 1]. Higher = more formal.

	Scoring based on:
	- Con, RegisterFilterAdvanced, Advanced register filtering with nominalisation and hedging passes., add_hedging() (+6 more)

	### Community 15 - "Utility Scripts"
	Cohesion: 0.2
	Nodes (14): apply_bea19_edits(), Apply BEA-2019 character-level edits to produce corrected text.

	edits_block for, create_splits(), Split train.jsonl into train and val sets., Converts all raw dataset formats into unified JSONL training format.
	Output sche, main(), process_bea19_json(), Process a BEA-2019 format JSON file (FCE or W&I+LOCNESS).
	Each line is a JSON ob (+6 more)

	### Community 16 - "Module Group 16"
	Cohesion: 0.24
	Nodes (9): CorrectionTrainer, Custom trainer — uses model's built-in loss directly., _strip_custom_fields(), Remove dataset fields that T5 doesn't accept., compute_loss(), Use model's built-in CE loss — avoids double-computing logits loss., Custom HuggingFace Trainer subclass.
	Uses the model's built-in cross-entropy los, prediction_step() (+1 more)

	### Community 17 - "Module Group 17"
	Cohesion: 0.29
	Nodes (5): RateLimitMiddleware, Simple in-memory rate limiting., RequestLoggingMiddleware, Logs all incoming requests with timing information., API middleware for request logging, rate limiting, and error handling.

	### Community 18 - "Module Group 18"
	Cohesion: 0.29
	Nodes (5): EarlyStoppingOnStyleDrift, Stops training if style similarity drops below threshold., StyleMetricsCallback, Logs style similarity metrics during evaluation., Training callbacks for monitoring and checkpointing.
	Integrates with Weights & B

	### Community 19 - "Module Group 19"
	Cohesion: 0.33
	Nodes (5): EmotionClassifier, Classifies emotional register of text using keyword-based analysis., classify(), Return emotion distribution over register categories.

	Returns a dict with keys:, Emotion/register classifier module.
	Classifies text emotional register (neutral,

	### Community 20 - "Module Group 20"
	Cohesion: 0.5
	Nodes (3): CorrectionRequest, CorrectionResponse, Pydantic schemas for API request/response validation.

	### Community 21 - "Infrastructure (Terraform)"
	Cohesion: 0.5
	Nodes (4): ALB + Auto Scaling Group, AWS Secrets Manager Integration, Terraform AWS Infrastructure, VPC Network Topology

	### Community 22 - "Utility Scripts"
	Cohesion: 0.67
	Nodes (1): Downloads all publicly available HuggingFace datasets automatically.
	Datasets re

	### Community 23 - "Module Group 23"
	Cohesion: 0.67
	Nodes (3): Cloudflare Edge Proxy, Lambda Origin Secret Rotator, X-Origin-Secret Middleware

	### Community 24 - "Security & Rate Limiting"
	Cohesion: 1.0
	Nodes (2): Attack Detection & IP Risk Management, Per-IP Rate Limiting

	### Community 26 - "WebSocket Codec"
	Cohesion: 1.0
	Nodes (1): HMAC-SHA256 Request Verification

	### Community 27 - "Module Group 27"
	Cohesion: 1.0
	Nodes (1): Proof-of-Work Challenge

	## Knowledge Gaps
	- 259 isolated node(s): `graphify_rebuild.py — One-shot NudR knowledge graph regeneration.

	Usage:
	py`, `Walk the project and return list of relevant files with metadata.`, `Compare against manifest to find changed files.`, `SHA-256 hash for cache keying.`, `Extract AST nodes and edges from a single Python file.` (+254 more)
	These have ≤1 connection - possible missing edges or undocumented components.
	- Thin community `Utility Scripts` (3 nodes): `download_all_huggingface_datasets.py`, `Downloads all publicly available HuggingFace datasets automatically.
	Datasets re`, `main()`
	Too small to be a meaningful cluster - may be noise or needs more connections extracted.
	- Thin community `Security & Rate Limiting` (2 nodes): `Attack Detection & IP Risk Management`, `Per-IP Rate Limiting`
	Too small to be a meaningful cluster - may be noise or needs more connections extracted.
	- Thin community `WebSocket Codec` (1 nodes): `HMAC-SHA256 Request Verification`
	Too small to be a meaningful cluster - may be noise or needs more connections extracted.
	- Thin community `Module Group 27` (1 nodes): `Proof-of-Work Challenge`
	Too small to be a meaningful cluster - may be noise or needs more connections extracted.

	## Suggested Questions
	_Questions this graph is uniquely positioned to answer:_

	- Why does `parse()` connect `Token Management` to `Utility Scripts`, `Module Group 10`?
	_High betweenness centrality (0.125) - this node is a cross-community bridge._
	- Why does `correct()` connect `Utility Scripts` to `Module Group 0`, `Utility Scripts`, `Module Group 2`, `Module Group 3`?
	_High betweenness centrality (0.092) - this node is a cross-community bridge._
	- Why does `extract_ast_file()` connect `Module Group 10` to `Token Management`?
	_High betweenness centrality (0.083) - this node is a cross-community bridge._
	- Are the 26 inferred relationships involving `train()` (e.g. with `__init__()` and `__init__()`) actually correct?
	_`train()` has 26 INFERRED edges - model-reasoned connections that need verification._
	- Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?
	_`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._
	- Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?
	_`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._
	- Are the 26 inferred relationships involving `__init__()` (e.g. with `train()` and `__init__()`) actually correct?
	_`__init__()` has 26 INFERRED edges - model-reasoned connections that need verification._