DisSim-FinBERT-FOMC

DisSim-FinBERT-FOMC is a discourse-guided sentiment pipeline for complex FOMC-style text. It first extracts level-0 nucleus clauses from a sentence and then runs sentiment classification on those nucleus clauses instead of on the full sentence. *DisSim Git: https://github.com/Lambda-3/DiscourseSimplification

What This Repo Contains

  • A Python implementation of the discourse extraction and aggregation pipeline.
  • A CLI for JSON, JSONL, TXT, and flat CSV workflows.
  • A mock sentiment backend for local smoke tests when the base model is unavailable.

Important Scope Note

This repository is a pipeline wrapper around ZiweiChen/FinBERT-FOMC. It is not a standalone checkpoint release with bundled model weights in this repo.

If the Hugging Face model cannot be loaded, the package falls back to a lightweight mock backend unless you explicitly force the Hugging Face backend.

Pipeline

Original sentence
  -> normalization
  -> clause segmentation
  -> cue-based discourse relation classification
  -> nucleus scoring
  -> level-0 nucleus extraction
  -> FinBERT-FOMC sentiment inference
  -> sentence-level aggregation

Installation

pip install -e .

Quick Start

from dissim_finbert_fomc import DisSimFinBERTFOMC

text = (
    "Although inflation has eased somewhat, it remains elevated "
    "and the Committee remains highly attentive to inflation risks."
)

pipe = DisSimFinBERTFOMC(
    local_files_only=True,
    backend_preference="mock",
    flat_backend_preference="heuristic",
)
result = pipe.predict(text)
print(result["level0_nuclei"])
print(result["final_label"], result["final_score"])

CLI

python -m dissim_finbert_fomc.cli --text "Economic activity has continued to expand at a solid pace, but uncertainty about the outlook has increased." --backend mock --local-files-only
python -m dissim_finbert_fomc.cli --input examples/test_sentences.json --output results.json --output-format json --backend mock --local-files-only
python -m dissim_finbert_fomc.cli --input examples/test_sentences.json --output output_flat.csv --output-format flat-csv --backend mock --flat-backend heuristic --local-files-only

Supported input formats:

  • .json
  • .jsonl
  • .txt

Supported output formats:

  • json
  • jsonl
  • flat-csv

Output Shape

predict() returns:

  • normalized input text
  • clause-level metadata
  • extracted level-0 nuclei
  • unit-level sentiment predictions
  • aggregated sentence-level label and score

Evaluation Status

This card intentionally does not publish benchmark tables or charts in the absence of versioned evaluation artifacts inside this repository. If you want to publish model-quality claims here, add the exact evaluation dataset, script, and result files first.

Limitations

  • This is a discourse-aware heuristic pipeline, not a full RST parser.
  • The default Java flat export path depends on an upstream DiscourseSimplification checkout and local Java/Maven availability.
  • The sentiment path depends on access to the upstream ZiweiChen/FinBERT-FOMC model unless you use the mock backend.

Citation

Kim, W., Niklaus, C., Lee, C. L., & Handschuh, S. (2025). DisSim-FinBERT: Text Simplification for Core Message Extraction in Complex Financial Texts. arXiv:2501.04959.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Wonseong/dissim-finbert-fomc

Finetuned
(2)
this model

Paper for Wonseong/dissim-finbert-fomc