Risk-o-meter Framework - Implementation Summary
β Completed
Successfully implemented the Risk-o-meter framework (Chakrabarti et al., 2018) and integrated it into the comparison pipeline.
π Paper Reference
Title: Risk-o-meter: Automated Risk Detection in Contracts
Authors: Chakrabarti, A., & Dholakia, K. (2018)
Key Achievement: 91% accuracy on termination clauses
Method: Paragraph vectors (Doc2Vec) + SVM classifiers
π― Implementation Details
Core Components
File: risk_o_meter.py (750+ lines)
1. Doc2Vec (Paragraph Vectors)
- Purpose: Learn distributed semantic representations of legal clauses
- Model: Distributed Memory (DM) variant
- Parameters:
- Vector size: 100 dimensions (configurable)
- Window: 5 words context
- Epochs: 30-40 (configurable)
- Algorithm: DBOW/DM (using DM for better semantic capture)
2. SVM Classifier
- Purpose: Multi-class risk categorization
- Kernel: RBF (default) or linear
- Features: Doc2Vec embeddings + optional TF-IDF augmentation
- Output: Risk categories with probability distributions
3. SVR Regressors (Extension)
- Purpose: Predict severity and importance scores
- Method: Support Vector Regression
- Output: Continuous scores (0-10 scale)
π§ Usage
# Test Risk-o-meter standalone
python risk_o_meter.py
# Run full comparison (9 methods including Risk-o-meter)
python compare_risk_discovery.py --advanced
π Now Available: 9 Methods Total
- K-Means (baseline)
- LDA Topic Modeling
- Hierarchical Clustering
- DBSCAN
- NMF
- Spectral Clustering
- GMM
- Mini-Batch K-Means
- Risk-o-meter β (NEW - Paper baseline: 91% accuracy)
π Files Modified
- β
risk_o_meter.py(NEW, 750+ lines) - β
compare_risk_discovery.py(updated for 9 methods) - β
risk_discovery_alternatives.py(added Method 9) - β
RISK_DISCOVERY_COMPREHENSIVE.md(added Risk-o-meter section) - β
requirements.txt(added gensim>=4.3.0)
π Ready to Run!
All code is implemented and ready for testing. The Risk-o-meter provides a paper-validated baseline (91% accuracy) for comparison with the other 8 methods.