book-rec-with-LLMs / docs /development /REFACTORING_GUIDE.md
ymlin105's picture
chore: remove obsolete files and update project structure
6ad997d

Refactoring Guide: Large Files

Status: Active Audience: Engineers planning refactoring work Last Updated: 2026-02-12 Priority: P2 (Quality improvement, not blocking)


Overview

Several files in the codebase exceed 300 lines. While not critical, breaking them into smaller modules improves:

  • Readability: Easier to understand each component
  • Testability: Smaller units = easier to test in isolation
  • Maintainability: Changes have smaller blast radius

Industry Standard: Files <300 lines, functions <50 lines.


Files Requiring Refactoring

File Lines Complexity Priority Suggested Split
src/ranking/features.py 470 High P0 Extract feature groups into separate modules
src/core/web_search.py 427 Medium P1 Separate API client from result parsing
src/vector_db.py 368 High P0 Split hybrid search from small-to-big retrieval
src/recall/sasrec_recall.py 337 Medium P2 Extract embedding logic
src/recall/embedding.py 333 Medium P2 Extract training vs inference
src/services/recommend_service.py 313 High P1 Extract diversity reranking orchestration
src/core/recommendation_orchestrator.py 293 Medium P2 Extract query preprocessing
src/core/metadata_store.py 288 Low P3 Already well-structured (mostly SQL)

Refactoring Strategy: features.py (Example)

Current Structure (470 lines)

class FeatureEngineer:
    def __init__():
        # 20 lines

    def _load_sasrec_features():
        # 35 lines

    def _load_user_sequences():
        # 20 lines

    def load_base_data():
        # 30 lines

    def generate_features():
        # 180 lines (HUGE!)
        # - User stats (15 lines)
        # - Item stats (15 lines)
        # - SASRec features (40 lines)
        # - ItemCF features (30 lines)
        # - UserCF features (30 lines)
        # - Last-N similarity (50 lines)

    def extract_features():
        # 120 lines

Proposed Refactoring

New Structure:

src/ranking/
β”œβ”€β”€ features.py              # Orchestrator (100 lines)
β”œβ”€β”€ feature_groups/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ user_features.py     # User stats extraction (40 lines)
β”‚   β”œβ”€β”€ item_features.py     # Item stats extraction (40 lines)
β”‚   β”œβ”€β”€ sasrec_features.py   # SASRec embedding features (60 lines)
β”‚   β”œβ”€β”€ cf_features.py       # ItemCF + UserCF features (70 lines)
β”‚   └── sequence_features.py # Last-N similarity (60 lines)

Refactored features.py (100 lines):

from src.ranking.feature_groups.user_features import UserFeatureExtractor
from src.ranking.feature_groups.item_features import ItemFeatureExtractor
from src.ranking.feature_groups.sasrec_features import SASRecFeatureExtractor
from src.ranking.feature_groups.cf_features import CFFeatureExtractor
from src.ranking.feature_groups.sequence_features import SequenceFeatureExtractor

class FeatureEngineer:
    """Orchestrates feature extraction from multiple sources."""

    def __init__(self, data_dir='data/rec', model_dir='data/model/recall'):
        self.data_dir = Path(data_dir)
        self.model_dir = Path(model_dir)

        # Initialize sub-extractors
        self.user_extractor = UserFeatureExtractor(data_dir)
        self.item_extractor = ItemFeatureExtractor(data_dir)
        self.sasrec_extractor = SASRecFeatureExtractor(data_dir, model_dir)
        self.cf_extractor = CFFeatureExtractor(data_dir, model_dir)
        self.seq_extractor = SequenceFeatureExtractor(data_dir)

    def load_base_data(self):
        """Load all feature extractors."""
        self.user_extractor.load()
        self.item_extractor.load()
        self.sasrec_extractor.load()
        self.cf_extractor.load()
        self.seq_extractor.load()

    def generate_features(self, user_id, candidate_item, **overrides):
        """Generate feature vector by delegating to sub-extractors."""
        feats = {}

        # Delegate to each extractor
        feats.update(self.user_extractor.extract(user_id))
        feats.update(self.item_extractor.extract(candidate_item))
        feats.update(self.sasrec_extractor.extract(user_id, candidate_item, **overrides))
        feats.update(self.cf_extractor.extract(user_id, candidate_item))
        feats.update(self.seq_extractor.extract(user_id, candidate_item, **overrides))

        return feats

Benefits:

  • Each extractor is <70 lines (easy to understand)
  • Testable in isolation: test_sasrec_features.py
  • Clear separation of concerns
  • Easy to add new feature groups

Refactoring Checklist

Before Refactoring:

  • Read existing code thoroughly
  • Identify natural module boundaries (feature groups, API layers, etc.)
  • Check test coverage (write tests if missing)
  • Document current behavior (input/output examples)

During Refactoring:

  • Create new modules with single responsibility
  • Move code incrementally (one function at a time)
  • Run tests after each move
  • Update imports in dependent files
  • Add docstrings to new modules

After Refactoring:

  • Run full test suite
  • Check code coverage (should not decrease)
  • Update ARCHITECTURE.md if structure changed
  • Benchmark performance (ensure no regression)
  • Update this guide with lessons learned

Refactoring Patterns

Pattern 1: Extract Feature Groups

When: Large function with distinct sections (user features, item features, etc.)

How:

  1. Identify groups: user stats, item stats, embeddings, CF scores
  2. Create one class per group: UserFeatureExtractor, ItemFeatureExtractor
  3. Each class has load() and extract() methods
  4. Main class delegates to extractors

Example: features.py β†’ feature_groups/*


Pattern 2: Split API Client from Parser

When: File handles both HTTP requests and response parsing

How:

  1. Create api_client.py: HTTP calls, retries, auth
  2. Create parsers.py: Parse JSON responses into domain objects
  3. Main file orchestrates: client.fetch() β†’ parser.parse()

Example: web_search.py β†’ google_books_client.py + result_parser.py


Pattern 3: Separate Training from Inference

When: File contains both model training code and inference code

How:

  1. Create trainer.py: Fit model, save weights
  2. Create predictor.py: Load weights, predict
  3. Shared utilities in model_utils.py

Example: sasrec_recall.py β†’ sasrec_trainer.py + sasrec_predictor.py


Pattern 4: Extract Helper Functions

When: Function >50 lines with reusable logic

How:

  1. Identify helper logic (e.g., "compute similarity", "filter results")
  2. Extract to _helper_function() or utils.py
  3. Main function becomes high-level orchestration

Example:

# Before (80 lines)
def recommend(self, user_id, k=50):
    # 1. Get history (15 lines)
    # 2. Query similar items (20 lines)
    # 3. Aggregate scores (25 lines)
    # 4. Filter & sort (20 lines)

# After (20 lines)
def recommend(self, user_id, k=50):
    history = self._get_user_history(user_id)
    similar_items = self._query_similar_items(history)
    scores = self._aggregate_scores(similar_items)
    return self._filter_and_sort(scores, k)

Example Refactoring: vector_db.py

Current (368 lines)

class VectorDB:
    def __init__():
        # 20 lines

    def search():
        # 30 lines

    def hybrid_search():
        # 120 lines (complex RRF fusion logic)

    def small_to_big_search():
        # 80 lines (hierarchical retrieval)

    def _rrf_fusion():
        # 60 lines

    def _build_bm25_index():
        # 40 lines

Proposed (3 files, each <150 lines)

src/vector_db.py (80 lines):

from src.retrieval.hybrid_retriever import HybridRetriever
from src.retrieval.hierarchical_retriever import HierarchicalRetriever

class VectorDB:
    """Facade for multiple retrieval strategies."""

    def __init__():
        self.hybrid = HybridRetriever()
        self.hierarchical = HierarchicalRetriever()

    def search(query, k=10):
        """Dense search only."""
        return self.chroma.search(query, k=k)

    def hybrid_search(query, k=10, alpha=0.5):
        """BM25 + Dense fusion."""
        return self.hybrid.search(query, k, alpha)

    def small_to_big_search(query, k=10):
        """Sentence β†’ Book hierarchical retrieval."""
        return self.hierarchical.search(query, k)

src/retrieval/hybrid_retriever.py (150 lines):

class HybridRetriever:
    """Combines BM25 (sparse) + Dense (embeddings) via RRF."""

    def __init__():
        self.bm25 = self._build_bm25_index()
        self.dense_db = Chroma(...)

    def search(query, k, alpha):
        dense_results = self._dense_search(query, k)
        sparse_results = self._sparse_search(query, k)
        return self._rrf_fusion(dense_results, sparse_results, alpha)

    def _rrf_fusion(dense, sparse, alpha):
        # 60 lines (RRF logic)

    def _build_bm25_index():
        # 40 lines

src/retrieval/hierarchical_retriever.py (130 lines):

class HierarchicalRetriever:
    """Small-to-Big (sentence β†’ book) retrieval."""

    def search(query, k):
        # 1. Find matching sentences (child nodes)
        sentences = self.sentence_db.search(query, k=50)

        # 2. Map to parent books (ISBNs)
        book_isbns = self._map_to_parents(sentences)

        # 3. Return full book context
        return self._enrich_with_metadata(book_isbns, k)

Migration Strategy

Incremental Refactoring (Safe)

Week 1: Extract utilities

  • Move helper functions to utils.py
  • No API changes
  • Run tests

Week 2: Create new modules

  • Add feature_groups/ directory
  • Copy (don't move) code to new modules
  • Old code still works

Week 3: Update imports

  • Change features.py to use new modules
  • Mark old methods as @deprecated
  • Update tests

Week 4: Remove old code

  • Delete deprecated methods
  • Update ARCHITECTURE.md
  • Celebrate! πŸŽ‰

Testing Strategy

Before Refactoring: Capture Behavior

# tests/test_features_baseline.py
def test_feature_extraction_baseline():
    """Capture current behavior before refactoring."""
    fe = FeatureEngineer()
    fe.load_base_data()

    # Test on known user/item
    feats = fe.generate_features("A123", "0123456789")

    # Save as baseline
    import json
    with open("tests/baseline_features.json", "w") as f:
        json.dump(feats, f)

After Refactoring: Compare Output

# tests/test_features_refactored.py
def test_feature_extraction_matches_baseline():
    """Ensure refactored code produces identical output."""
    fe = FeatureEngineer()  # New refactored version
    fe.load_base_data()

    feats = fe.generate_features("A123", "0123456789")

    # Load baseline
    import json
    with open("tests/baseline_features.json") as f:
        baseline = json.load(f)

    # Compare
    for key in baseline:
        assert key in feats, f"Missing feature: {key}"
        assert abs(feats[key] - baseline[key]) < 1e-6, f"Feature {key} mismatch"

Performance Considerations

Refactoring Should Not Slow Things Down

Before Refactoring: Benchmark

python benchmarks/benchmark.py > baseline.txt

After Refactoring: Re-benchmark

python benchmarks/benchmark.py > refactored.txt
diff baseline.txt refactored.txt

Acceptable Changes:

  • Latency: Β±5% (small overhead from function calls OK)
  • Memory: Β±10% (object overhead from new classes OK)

Red Flags:

  • Latency increase >10% β†’ Investigate (unnecessary copies? Extra I/O?)
  • Memory increase >20% β†’ Investigate (duplicate data? Leaks?)

Common Pitfalls

Pitfall 1: Over-Refactoring

Problem: Creating too many tiny files (10-20 lines each)

Solution: Aim for 50-150 lines per file. Smaller isn't always better.


Pitfall 2: Breaking API Compatibility

Problem: Refactoring changes function signatures, breaking existing code

Solution: Use deprecation warnings

import warnings

def old_method(self, x):
    warnings.warn("old_method is deprecated, use new_method", DeprecationWarning)
    return self.new_method(x)

Pitfall 3: Not Updating Tests

Problem: Tests still import old paths

Solution: Update imports incrementally

# Before
from src.ranking.features import FeatureEngineer

# After
from src.ranking.features import FeatureEngineer  # Facade
from src.ranking.feature_groups.sasrec_features import SASRecFeatureExtractor  # Specific

When NOT to Refactor

  • Deadline pressure: Refactoring during crunch time adds risk
  • No tests: Write tests first, then refactor
  • Unclear requirements: Understand the code before restructuring
  • "Just because": Only refactor if it improves readability/testability

Rule of Thumb: Refactor when you're touching the code for a feature/bug anyway. Don't refactor in isolation.


Success Metrics

How do you know refactoring succeeded?

  • Code is easier to understand (ask a teammate)
  • Tests pass (same behavior)
  • Performance unchanged (benchmark)
  • Test coverage increased (easier to test small modules)
  • Future changes are faster (measure time to implement next feature)

Related Documentation


Refactoring Backlog

Track refactoring work here:

File Priority Assigned Status Target Date
ranking/features.py P0 Unassigned Not Started TBD
core/web_search.py P1 Unassigned Not Started TBD
vector_db.py P0 Unassigned Not Started TBD

Last Updated: 2026-02-12 Next Review: After v3.0 planning