code2-repo / doc /HIERARCHICAL_INTEGRATION_CHECKLIST.md
Deepu1965's picture
Upload folder using huggingface_hub
9b1c753 verified

Hierarchical BERT Integration Checklist

Files That Need Changes

βœ… Already Updated

  1. model.py - Added HierarchicalLegalBERT class

πŸ”§ Files That Need Updates

High Priority (Training/Inference Core)

  1. trainer.py - Update to support hierarchical model

    • Line 14: Import statement
    • Line 169: Model initialization
    • Training loop compatibility
  2. train.py - Add hierarchical model option

    • Line 0: Add command-line argument for model type
  3. evaluate.py - Support hierarchical model evaluation

    • Line 45: Import statement
    • Line 47: Model initialization
  4. calibrate.py - Support hierarchical model calibration

    • Line 14: Import statement
    • Model usage throughout
  5. advanced_analysis.py - Support hierarchical analysis

    • Line 16: Import statement
    • Line 25: Model loading

Medium Priority (Utilities)

  1. analyze_document.py - Add hierarchical document analysis

    • Line 274: Import statement
    • Add hierarchical inference option
  2. config.py - Add hierarchical model config options

    • Add use_hierarchical_model flag
    • Add hierarchical-specific parameters

Low Priority (Testing)

  1. test_setup.py - Add hierarchical model tests
    • Line 107: Import statement

πŸ“ Summary of Changes Needed

Import Changes:

# OLD:
from model import FullyLearningBasedLegalBERT

# NEW (support both):
from model import FullyLearningBasedLegalBERT, HierarchicalLegalBERT

Model Selection Logic:

if config.use_hierarchical_model:
    model = HierarchicalLegalBERT(config, num_discovered_risks)
else:
    model = FullyLearningBasedLegalBERT(config, num_discovered_risks)

Forward Pass Changes:

# For single-clause training (hierarchical model)
if isinstance(model, HierarchicalLegalBERT):
    outputs = model.forward_single_clause(input_ids, attention_mask)
else:
    outputs = model(input_ids, attention_mask)

Inference Changes:

# For document-level inference (hierarchical model)
if isinstance(model, HierarchicalLegalBERT) and analyze_full_doc:
    results = model.predict_document(document_structure)
else:
    # Clause-by-clause inference
    results = model.predict_risk_pattern(input_ids, attention_mask)

Implementation Order

  1. βœ… config.py - Add configuration flags
  2. βœ… trainer.py - Update model initialization and training loop
  3. βœ… train.py - Add command-line args
  4. βœ… evaluate.py - Add hierarchical evaluation
  5. βœ… calibrate.py - Add hierarchical calibration
  6. βœ… advanced_analysis.py - Add hierarchical analysis
  7. βœ… analyze_document.py - Add hierarchical document analysis
  8. βœ… test_setup.py - Add tests

Backward Compatibility

All changes maintain backward compatibility:

  • Default behavior: Uses FullyLearningBasedLegalBERT (current model)
  • Optional: Use HierarchicalLegalBERT via config flag
  • Training: Both models train the same way (clause-level)
  • Inference: Hierarchical model offers enhanced document-level analysis