DisSim-FinBERT-FOMC
DisSim-FinBERT-FOMC is a discourse-guided sentiment pipeline for complex FOMC-style text. It first extracts level-0 nucleus clauses from a sentence and then runs sentiment classification on those nucleus clauses instead of on the full sentence. *DisSim Git: https://github.com/Lambda-3/DiscourseSimplification
What This Repo Contains
- A Python implementation of the discourse extraction and aggregation pipeline.
- A CLI for JSON, JSONL, TXT, and flat CSV workflows.
- A mock sentiment backend for local smoke tests when the base model is unavailable.
Important Scope Note
This repository is a pipeline wrapper around ZiweiChen/FinBERT-FOMC. It is not a standalone checkpoint release with bundled model weights in this repo.
If the Hugging Face model cannot be loaded, the package falls back to a lightweight mock backend unless you explicitly force the Hugging Face backend.
Pipeline
Original sentence
-> normalization
-> clause segmentation
-> cue-based discourse relation classification
-> nucleus scoring
-> level-0 nucleus extraction
-> FinBERT-FOMC sentiment inference
-> sentence-level aggregation
Installation
pip install -e .
Quick Start
from dissim_finbert_fomc import DisSimFinBERTFOMC
text = (
"Although inflation has eased somewhat, it remains elevated "
"and the Committee remains highly attentive to inflation risks."
)
pipe = DisSimFinBERTFOMC(
local_files_only=True,
backend_preference="mock",
flat_backend_preference="heuristic",
)
result = pipe.predict(text)
print(result["level0_nuclei"])
print(result["final_label"], result["final_score"])
CLI
python -m dissim_finbert_fomc.cli --text "Economic activity has continued to expand at a solid pace, but uncertainty about the outlook has increased." --backend mock --local-files-only
python -m dissim_finbert_fomc.cli --input examples/test_sentences.json --output results.json --output-format json --backend mock --local-files-only
python -m dissim_finbert_fomc.cli --input examples/test_sentences.json --output output_flat.csv --output-format flat-csv --backend mock --flat-backend heuristic --local-files-only
Supported input formats:
.json.jsonl.txt
Supported output formats:
jsonjsonlflat-csv
Output Shape
predict() returns:
- normalized input text
- clause-level metadata
- extracted level-0 nuclei
- unit-level sentiment predictions
- aggregated sentence-level label and score
Evaluation Status
This card intentionally does not publish benchmark tables or charts in the absence of versioned evaluation artifacts inside this repository. If you want to publish model-quality claims here, add the exact evaluation dataset, script, and result files first.
Limitations
- This is a discourse-aware heuristic pipeline, not a full RST parser.
- The default Java flat export path depends on an upstream DiscourseSimplification checkout and local Java/Maven availability.
- The sentiment path depends on access to the upstream
ZiweiChen/FinBERT-FOMCmodel unless you use the mock backend.
Citation
Kim, W., Niklaus, C., Lee, C. L., & Handschuh, S. (2025). DisSim-FinBERT: Text Simplification for Core Message Extraction in Complex Financial Texts. arXiv:2501.04959.
Model tree for Wonseong/dissim-finbert-fomc
Base model
ZiweiChen/FinBERT-FOMC