code2-repo / doc /HIERARCHICAL_INTEGRATION_CHECKLIST.md
Deepu1965's picture
Upload folder using huggingface_hub
9b1c753 verified
# Hierarchical BERT Integration Checklist
## Files That Need Changes
### βœ… Already Updated
1. **model.py** - Added `HierarchicalLegalBERT` class
### πŸ”§ Files That Need Updates
#### **High Priority** (Training/Inference Core)
1. **trainer.py** - Update to support hierarchical model
- Line 14: Import statement
- Line 169: Model initialization
- Training loop compatibility
2. **train.py** - Add hierarchical model option
- Line 0: Add command-line argument for model type
3. **evaluate.py** - Support hierarchical model evaluation
- Line 45: Import statement
- Line 47: Model initialization
4. **calibrate.py** - Support hierarchical model calibration
- Line 14: Import statement
- Model usage throughout
5. **advanced_analysis.py** - Support hierarchical analysis
- Line 16: Import statement
- Line 25: Model loading
#### **Medium Priority** (Utilities)
6. **analyze_document.py** - Add hierarchical document analysis
- Line 274: Import statement
- Add hierarchical inference option
7. **config.py** - Add hierarchical model config options
- Add `use_hierarchical_model` flag
- Add hierarchical-specific parameters
#### **Low Priority** (Testing)
8. **test_setup.py** - Add hierarchical model tests
- Line 107: Import statement
### πŸ“ Summary of Changes Needed
**Import Changes:**
```python
# OLD:
from model import FullyLearningBasedLegalBERT
# NEW (support both):
from model import FullyLearningBasedLegalBERT, HierarchicalLegalBERT
```
**Model Selection Logic:**
```python
if config.use_hierarchical_model:
model = HierarchicalLegalBERT(config, num_discovered_risks)
else:
model = FullyLearningBasedLegalBERT(config, num_discovered_risks)
```
**Forward Pass Changes:**
```python
# For single-clause training (hierarchical model)
if isinstance(model, HierarchicalLegalBERT):
outputs = model.forward_single_clause(input_ids, attention_mask)
else:
outputs = model(input_ids, attention_mask)
```
**Inference Changes:**
```python
# For document-level inference (hierarchical model)
if isinstance(model, HierarchicalLegalBERT) and analyze_full_doc:
results = model.predict_document(document_structure)
else:
# Clause-by-clause inference
results = model.predict_risk_pattern(input_ids, attention_mask)
```
---
## Implementation Order
1. βœ… **config.py** - Add configuration flags
2. βœ… **trainer.py** - Update model initialization and training loop
3. βœ… **train.py** - Add command-line args
4. βœ… **evaluate.py** - Add hierarchical evaluation
5. βœ… **calibrate.py** - Add hierarchical calibration
6. βœ… **advanced_analysis.py** - Add hierarchical analysis
7. βœ… **analyze_document.py** - Add hierarchical document analysis
8. βœ… **test_setup.py** - Add tests
---
## Backward Compatibility
All changes maintain backward compatibility:
- Default behavior: Uses `FullyLearningBasedLegalBERT` (current model)
- Optional: Use `HierarchicalLegalBERT` via config flag
- Training: Both models train the same way (clause-level)
- Inference: Hierarchical model offers enhanced document-level analysis